Vol. 44, Parts 1 and 2 ICHIGAN 


JUN 2 1957 


SCIENCE 
LIBRARY 


” BIOM ETRIKA 


June 1957 


FOUNDED BY 


W. F. R. WELDON, FRANCIS GALTON ann KARL PEARSON 


MANAGING EDITOR 


KE. S. PEARSON 


ASSOCIATE EDITOR 
M. G. KENDALL 


ISSUED BY 
THE BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON 


PRINTED AT THE UNIVERSITY PRESS, CAMBRIDGE 


[Issued 4 June 1957] 








This volume of Biometrika is published with the co-operation of 


F. N. DAVID D. G. KENDALL 
J. DURBIN D. V. LINDLEY 
M. J. R. HEALY R. L. PLACKETT 


J. B. S. HALDANE 
A volume containing about 500 pages will be published annually in two half- 
yearly issues, appearing in June and December. 
Papers for publication should be sent either to 


PROFESSOR E. S. PEARSON 
Department of Statistics, University College, London, W.C. 1 
or if more convenient to 


PROFESSOR M. G. KENDALL 
London School of Economics, Houghton Street, London, W.C. 2 


It is a condition of publication in Biometrika that the paper shall not already 
have been issued elsewhere, and will not be reprinted without leave of the 
Editors. 

Contributors receive 25 copies of their papers free. Joint authors 15 copies 
each. Order forms for separates are sent to authors with proofs of their 
papers. 

The Subscription price, payable in advance, is: 

Inlard: 45s. net per volume § Abroad: 54s. net per volume 
including packing and postage 
BACK ISSUES 
Volumes 20-43 

These may be obtained from the BIOMETRIKA OFFICE, at the following 
prices : 

Vols, 20-38: £5 per volume Vols. 39-43: £3. 3s. per volume 

packing and postage 2s. per volume 
Bound volumes: £1 extra per volume Binding cases: 10s. each 


Cheques should be made payable to Biometrika, crossed “a/c BIOMETRIKA 
TRUST” and sent to 
THE SECRETARY 


BIOMETRIKA OFFICE 
UNIVERSITY COLLEGE 
GOWER STREET, LONDON, W.C.1 


to whom all orders for series, single copies and offprints should be addressed. 
All foreign cheques must be drawn in sterling and on a Bank having a 
London Agency. 











i 


Volumes 1-19 
Permission for reprinting has been granted to Messrs Wm. Dawson & 
Sors Ltd. Volumes 1-13 are now ready for distribution and Volumes 
14-19 are in course of preparation. Would librarians and others wishing 
to have copies please plac: their orders now with: 


Wm. DAWSON & SONS LTD., 


4 DUKE STREET, MANCHESTER SQUARE, 
LONDON, W.1 






































JOHN WISHART 
1898-1956 








Vi 





VoLuME 44, Parts 1 anp 2 JUNE 1957 





JOHN WISHART 
1898-1956 


There are probably few statisticians who have had more friends scattered across the world 
than had John Wishart. Many of these friendships were made in the course of his work 
as a teacher of statistical method to practical agriculturalists overseas. It was in the middle 
of teaching work of this kind, which perhaps he had come to enjoy more than any other and 
in which he undoubtedly excelled, that he died suddenly on 14 July 1956 while bathing at 
Acapulco on the Pacific coast of Mexico. A pioneering visit to Nanking University in 1934 
had been followed after the war by visits to Spain in 1947, to the United States in 1949, to 
India in 1954 and then, in this last year, to Mexico, where he was taking a leading part in 
the work of the Training Centre in Experimental Design arranged by the United Nations 
Food and Agricultural Organization. 

John was the second of the four sons of John and Elizabeth Wishart; he was born in 
Montrose on 28 November 1898, the family moving to Perth in 1900. After schooling 
at Perth Academy he went to Edinburgh University in 1916, finally obtaining a Ist Class 
Honours Degree in Mathematics and Natural Philosophy in 1922. His university career 
was broken by two years’ service in the Army (1917-19), when he was a subaltern in the 
7th Battalion of the Black Watch, and saw active service in France during the last few 
months of the war. His four years at the university included a teacher’s training course at 
Moray House, which led him on to his first post as a mathematics master at the West 
Leeds High School (1922-4). Here he met Olive Birdsall, whom he married in 1924. There 
were two sons of the marriage. 

When on the look-out for mathematical assistants, Karl Pearson had on several occasions 
asked E. T. Whittaker for a suitable name among his past Edinburgh students, and it was 
with a recommendation of this kind that Wishart came to University College, London, in 
the autumn of 1924 to start on a statistical career. The post which he filled was that of 
Research Assistant to Pearson, the funds being supplied by the Department of Scientific 
and Industrial Research. 

The computation of the Tables of the Incomplete Gamma Function had been finished, the 
Tables being issued in 1922, and Pearson was already preparing the ground for his next 
great computational undertaking, that of the Tables of the Incomplete Beta Function. 
It followed that one of Wishart’s main tasks on arriving in London was to get this work on 
the Beta function under way. I cannot recall how far he got with the tabling programme, 
which was not finally completed till 1932, but there is no doubt that his first years of 
apprenticeship in statistics involved him in much hard computing labour. Three of his 
earliest papers, too, published in Biometrika (1,2,5*) were concerned with methods of 
approximation to the incomplete Beta function. Both the Gamma and Beta distributions 
and their practical uses were a continuing source of interest to him, and one of his last 
papers published in 1956(77) was concerned with a new method of approximating to the 
integral of the former. 

* Here, and below, the numbers are those given in order of date in the Bibliography on pp. 6-8 
below. 

The portrait facing this page was taken by Bassano in 1951. 


Biom. 44 








2 John Wishart 


I do not think that Wishart regretted his hard training in computation which had 
started, without machine aid, in Whittaker’s mathematical laboratory. Later on he 
was to play an active part on the British Association’s Mathematical Tables Committee 
(1928-48), of which he was for some years the secretary, and Biometrika owes him much for 
the help, advice and criticism which he gave in the recasting of the Biometrika Tables for 
Statisticians. 

During his three years at University College Wishart attended the two main courses of 
lectures on mathematical statistics which Pearson was giving annually, and he learned how 
to handle mathematical problems and put together a scientific paper, but the terms of his 
appointment as research assistant precluded any form of teaching. He was a teacher by 
temperament and training, and it was, perhaps, the prospect of starting some statistical 
classes at a new centre, rather than any marked advance in salary, which led him in the 
autumn of 1927 to accept a post as Mathematical Demonstrator at the Imperial College of 
Science. 

He had scarcely settled in his new post, however, when a second great opportunity came 
his way and he found himself at the beginning of 1928 appointed as Statistical Assistant 
under R. A. Fisher at Rothamsted. Fisher’s scientific output had at this juncture reached a 
remarkable tempo, and Wishart had the good fortune to see at close quarters and to share 
in the development of much new theory and in the trying out of that theory in practical 
experimentation. 

His very considerable number of publications during the years 1928-32 illustrate well 
Wishart’s fields of research. On the more mathematical side, with Fisher’s encouragement, 
he broke new ground in 1928 with the derivation of the generalized product-moment 
distribution (8) which was to play an important part in the development of multivariate 
analysis. A number of his other papers followed from this beginning (e.g. (10, 13)). Other 
papers (9, 11,23,29 were concerned with the properties of the distribution of the multiple 
correlation coefficient and allied distributions which Fisher had derived between 1924 and 
1928, using geometrical methods. Wishart took a hand, too, in Fisher’s work of showing 
how the impasse in the theory of sampling moments which seemed to have been reached in 
the 1920’s, could be broken through by a systematic use of k-statistics and the methods of 
combinatorial analysis (12, 21,24). I well remember how my first gleam of understanding 
on partitional problems came from Wishart’s demonstration with match-stick patterns in 
his sitting-room at Harpenden. 

At the same time Wishart was taking his full share in the Statistical Department’s work 
of servicing the experimental programme at Rothamsted. Joint papers such as those with 
Clapham (15), Fisher (17) and Allan (18) were pioneer efforts, written to illustrate how the 
new techniques were working out in practice. In addition, he wrote or shared in writing 
a number of simple expository papers (e.g. (14,20, 22,26)). His power of putting things 
clearly to his non-mathematical colleagues was certainly appreciated. As Sir J ohn Russell 
wrote shortly after Wishart’s death: 


It seems only the other day that he first came to Rothamsted to help in the development of what was 
then a new and untried subject, of the value of which many were yet uncertain. He threw himself 
into the work with energy and enthusiasm, mastered its intricacies and having the gift of exposition 
was able to explain its difficulties to the various young people at Rothamsted who wanted to learn some- 
thing about it but feared it was beyond their grasp. But he succeeded and he helped many. . . We all 
regretted his departure to Cambridge, but of course we knew it was a bigger field giving him more scope. 





—.. 





o_o re 








————— ooo eNO 


John Wishart 3 


In 1931 Udny Yule decided to give up his post of University Lecturer in Statistics at 
Cambridge which he had held since 1912, and a committee of the University recommended 
some reorganization of the teaching work. A Reader* was to be appointed in the Faculty 
of Agriculture, who should also do some teaching in the Faculty of Mathematics, while 
Economic Statistics was to be the concern of a separate Lecturer. Wishart was appointed 
to take up the Readership in October 1931; there was no one more suited for the post. He 
had behind him 34 years experience of applying the new statistical methods in agricultural 
experimentation and of explaining their meaning to the non-mathematician; he had received 
an apprenticeship in mathematical statistics under both Karl Pearson and R. A. Fisher; 
he had a shrewd, objective judgement which enabled him to keep clear of controversy and 
to combine what he thought good from the older and newer schools of thought; and, finally, 
he was ready to take any amount of trouble in developing lecture and practical courses 
for new types of student audience. 

He was given accommodation in the School of Agriculture, where in addition to his own 
room he had a larger ‘laboratory’ containing some half dozen desks for post-graduate 
students. He began his teaching with a general course on statistical methods offered in the 
Faculty of Agriculture and a course on mathematical statistics which could be offered for 
Schedule B of the Mathematical Tripos. There was also an optional practical class associated 
with the second course. In the eight years which intervened before the war, the number of 
mathematical students who attended his Tripos course and in many cases stayed on for 
post-graduate work increased steadily. This drawing of Cambridge-trained mathematicians 
into the field of statistics had a profound effect on the rate of development of the subject 
in this country, and it was Wishart who played the essential part in achieving this result. 

Impressions of these days left on his early students are interesting. As is inevitably the 
case when statistics teaching can only be fitted in as one subject in a mathematical syllabus, 
the Schedule B course was rather overcrowded. It was in the more leisurely conditions of 
full-time post-graduate study that Wishart’s teaching was most effective. A characteristic 
aspect of the research laboratory has been described by W. G. Cochran: 


In those days he believed in his students keeping office hours. When he assigned me a desk in the 
Lab., he told me that he expected me to be sitting at the desk most of the day when not in class. He 
instructed me to do three hours computing a day on a table of the 1 % level of z to 7 decimal places, 
while Fisher in the meantime was computing the 5% table. Having anticipated a free and easy life 
as a graduate student, punctuated of course by periods of esoteric thinking when the spirit moved me, 
I didn’t much like either the office hours or the computing, but I don’t think they did me any harm. 


In asking post-graduates to keep regular hours for statistical work and later on in 
expecting somewhat similar regularity from his lecturers, he was following, perhaps un- 
consciously, Pearson’s tradition established at University College. It was a routine less 
readily accepted at Cambridge, especially on the mathematical side. Another custom in 
which he followed K. P. was in making daily rounds of the laboratory to see how his students 
were getting on and to discuss any difficulties with them. 

H. O. Hartley was a student a year or two after Cochran, and writes: 


John Wishart’s strength as a teacher of applied statistics was two-sided. He had an amazing gift to 
inspire a very theoretical mathematician, like myself, to learn something about applications and to 
bring him into contact with the agriculturalist and biologist. He was also successful, although less so, 
in converting the latter to see the need of mathematical statistics. He would walk into the laboratory 


* Yule was made Reader for the last few months of his appointment. 








4 John Wishart 


and look over your shoulder as you were struggling with an analysis of variance, point to a figure and say 
‘this one looks wrong’, and would usually be infallible in this. He had a computational flair, a great gift 
for approximations and took a great interest in computing methods, tables and machines. In short he 
was an ideal teacher of statistical analysis, but not quite as convincing as a teacher of statistical inference. 
He had a good humoured gift to learn from failure and tell the stories against himself for others to learn 
from. 

A feature of this Cambridge development in the 1930’s was the location of the statistical 
laboratory in the School of Agriculture, with agricultural and mathematical students 
working more or less side by side. “This was, if you like’, M.S. Bartlett points out, ‘a reflexion 
of Wishart’s dual qualifications; in those early days at least it was a very happy combination 
which set the tone for a balanced outlook on the theory of statistics, which was always to 
be found in his department.’ 

It has been remarked that Wishart did not produce much original research in these pre- 
war years, and it is true that if we look at his list of publications, the papers written between 
1933 and 1939 were almost entirely of an applied or expository character. But original 
mathematical research is not the only mark of a good statistician. We should remember 
that until Bartlett was appointed in 1938 as a mathematical lecturer to help in the statistical 
work, Wishart was teaching single-handed. At the same time he was acting as consultant 
in statistical matters in connexion with the experimental work of various Cambridge 
research institutes. It was a time too when much hard spadework in the way of relatively 
simple exposition was essential to convince the experimentalists at Cambridge and else- 
where of the value of the new statistical tools. This was a type of work that he found most 
congenial and he threw himself into it to the full. 

The Industrial and Agricultural Research Section of the Royal Statistical Society was 
founded in 1933 and Wishart was one of the six Fellows of the Society who formed the 
original organizing committee. He gave the second paper to the section in 1934 on ‘Statistics 
in Agricultural Research’ (35), and later in the same year provided a long ‘Bibliography of 
Agricultural Statistics, 1931-33’ (36) for volume 1 of the I.A.R.S. Supplement. Indeed, he 
continued to play an active part in the section’s first six years of very vigorous life. 

If the particular organization of statistics at Cambridge worked admirably to start with, 
it did contain the seeds of later trouble. As the agriculturalists became more familiar with 
statistical method and the routine of experimental design was established, the need for a 
Reader with Wishart’s qualifications within the School of Agriculture became less apparent. 
The natural development would seem to have been the creation of an independent uni- 
versity statistical unit or department, drawing its recruits from mathematics and serving 
the university as a whole. But it has never been an easy task to convince a Mathematics 
Faculty that Mathematical Statistics is a subject with a discipline of its own, worthy of 
being allowed an independent existence, and Wishart, both before and after the war, found 
many of the Cambridge mathematicians as conservative in outlook as any in this respect. 
He found himself between two stools, and though there were no doubt faults on all sides to 
account for the friction which resulted, his position was not altogether an easy one. As in 
many another case, when war came he was glad to be able to throw himself into another form 
of occupation which for a time, at any rate, would free him from more academic battles. 

From May 1940 to February 1942 he was in the Army as a Captain employed on Intelli- 
gence work, and then, until 1946, he was occupied with statistical work for the Admiralty 
in London and Bath, working with the rank of Assistant Secretary in the Establishment, 
the Production and the Naval Personnel Divisions of the Secretary’s Department. When 





—es Te 








John Wishart 5 


the war ended he hesitated for some time whether to return to Cambridge, but no doubt 
the great opportunity of developing the work he had started there drew him back to 
university teaching. 

On the mathematical side Wishart was now able to make great progress. Additional 
members of the Mathematics Faculty were appointed who were to be primarily concerned 
with statistical teaching, so that by the early 1950’s the Reader could call on the help of 
three lecturers and an assistant lecturer. Part IT of the revised Mathematical Tripos con- 
tained a course on Random Variables which aimed at preparing the way for fuller statistical 
work in Part III.* Additional post-graduate courses were offered and, finally, te give more 
adequate training on the applied side, there was a Diploma course which required the post- 
graduate student to spend a considerable amount of his time on the application of theory 
in a selected field. The Diploma courses at Oxford, Manchester and Aberdeen have followed 
the Cambridge experiment on rather similar lines. 

Approval was given for a Statistical Laboratory to be set up within the Mathematics 
Faculty, and after some delay due to building difficulties this was housed in a temporary 
building in St Andrew’s Hill. When it was opened in the summer of 1949, Wishart could 
look back with some satisfaction at the success which he had had in grafting the teaching of 
statistics on to the Cambridge mathematical system. Yet recognition of his own position 
still came rather grudgingly, for it was not until 1953 that his appointment as Director of 
the Statistical Laboratory was confirmed. It began to be realized, too, as had been found 
elsewhere, that a statistical unit could be of better service to a university if it were free from 
the control of mathematics. 

In his post-war published work the emphasis was again on exposition, but this was now 
mainly in the mathematical rather than the agricultural field. Thus between 1947 and 1951 
at a time when the Institute of Actuaries was introducing additional statistical matter into 
its revised examination syllabus, he wrote four useful papers (54, 55,59,65) on different 
aspects of statistical theory for the Journal of the Institute’s Students’ Society. New con- 
tributions totheory include three Biometrika papers (53, 62, 67) dealing with his old favourite, 
the development of moment and cumulant theory. In the applied field, perhaps the most 
important contribution was a largely rewritten Part I to Wishart & Sanders’s Principles 
and Practice of Field Experimentation (75). This admirable little book, translated into 
Spanish, formed the basis of his 1956 F.A.O. course in Mexico. 

If his output of new scientific work was small, Wishart’s last ten years were very active 
ones. In addition to the work of building up the Cambridge Laboratory, he was made the 
first Chairman of the Royal Statistical Society’s Research Section, when it branched off 
from the Industrial and Agricultural Research Section in 1945. He also took a leading part 
in the Society’s two post-war committees on the Teaching of Statistics. Made an Assistant 
Editor of Biometrika in 1937 and Associate Editor in 1948 he played a loyal and diligent 
part in maintaining the standard and tradition of this Journal. 

With able colleagues available in Cambridge it was possible for him to make use of the 
University’s newly established sabbatical year and to find other opportunities for going 
overseas. Thus he was in the United States for nine months in 1949, lecturing at the Uni- 
versities of North Carolina and California. In the autumn of 1954 he lectured for three 


* A full description of the courses available in 1955 is given as an Appendix to Wishart’s con- 
tribution (72) to the discussion on the Teaching of Statistics held by the Royal Statistical Society in 
February 1955. 








6 John Wishart 


months at the F.A.O.’s Training Centre in Experimental Design and the Survey Technique 
of Experimentation held at New Delhi. Finally, after going to Mexico in January 1956 as 
a technical assistance expert to the Mexican Government, he was responsible for the 
day-to-day running of a similar F.A.O. course in Mexico City from April until his death 
in July. 

A number of warm tributes have been paid to the value of his work on these international 
courses. There were many elements in his success: the carefully planned lectures, his manner 
of getting them across and his skill in getting back to reality after short excursions into the 
abstract. As a colleague of his on the Indian course remarks: ‘Few of his students will 
forget his injunction not to substitute covariance for weed control and other aspects of 
good husbandry.’ But apart from work of a more formal character, he had, and showed that 
he had, a real concern with the particular difficulties and problems of each student and 
colleague. His interest in students from other lands had started in his Rothamsted days 
and continued at Cambridge, both in the hospitality which he and his wife showed to foreign 
visitors in their home and through the activities of the Cambridge All Peoples Association. 
In both India and Mexico he was eager to learn about the art and architecture, the history 
and customs, of the people around him, and he learned enough Spanish to make his in- 
augural speech at the Central American Training Centre in this language. His friendliness 
and obvious enjoyment in all that was happening made a notable contribution to the 
success of these gatherings. 

If we look back in retrospect we shall realize, I think, that John Wishart’s contribution 
to our subject has lain not so much in any outstanding piece of mathematical research or 
the development of new concepts, but in the many-sided and very necessary part which 
he has played in the dissemination of statistical ideas and the application of statistical 
methods during the last 30 years. Whether it was in early trying out of the analysis of 
variance in the field, in convincing the agriculturalist that the principles of experimental 
design were not too hard after all to understand, in exploiting new techniques to the general 
advantage of an expanding methodology, in building up a teaching and research organization 
at Cambridge or in extending the activities of the Royal Statistical Society, he was there, 


doing his share of work to the full. E. 8S. PEARSON 


Bibliography of Wishart’s published work 
(1 


~— 


6 
Determination of | cos"*} @d@ for large values of n, and its application to the probability integral 
0 
of symmetrical frequency curves. (1925), Biometrika, 17, 68-78. 
(2 


~— 


Further consideration of the integral | cos"*! #d@ for large values of n. (1925), Biometrika, 17, 
469-72. 0 

(3) Note on Barlow’s Tables. (1926), Natwre, Lond., 117, 856-7. Accurate square roots (Schlesinger, 
F., a reply). (1926), Nature, Lond., 118, 338-9. 

(4) On Romanovsky’s generalised frequency curves. (1926), Biometrika, 18, 221-8. 

(5) On the approximate quadrature of certain skew curves, with an account of the researches of 
Thomas Bayes. (1927), Biometrika, 19, 1-38. 

(6) On the distribution of the error of an interpolated value, and on the construction of tables (with 
R. A. Fisher). (1927), Proc. Camb. Phil. Soc. 23, 912-21. 

(7) On errors in the multiple correlation coefficient due to random sampling. (1928), Mem. Roy. 
Meteor. Soc. 2, 29-37 and (summarized) Quart. J. R. Met. Soc. 54, 308. 

(8) The generalised product moment distribution in samples from a normal multivariate population. 
(1928), Biometrika, 20A, 32-52. 





ral 





(9) 


(10) 
(11) 


(12) 
(13) 
(14) 
(15) 
(16) 
(17) 
(18) 
(19) 
(20) 
(21) 
(22) 
(23) 
(24) 


(25) 
(26) 


(27) 
(28) 


(29) 
(30) 


(31) 
(32) 
(33) 
(34) 
(35) 
(36) 
(37) 
(38) 
(39) 
(40) 
(41) 
(42) 


(43) 


John Wishart 7 


Table of significant values of the multiple correlation coefficient. (1928), Quart. J. R. Met. Soc. 
54, 258-9. 

Sampling errors in the theory of two factors. (1928), Brit. J. Psychol. 19, 180-7. 

Le traitement correct des problémes de corrélation multiple en métérologie et en agriculture. 
(1928), Ass. Franc. l Avance. des Sci., Congr. d. l. Rochelle (pp. 4). 

A problem in combinatorial analysis giving the distribution of certain moment statistics. (1928), 
Proc. Lond. Math. Soc. 29, 309-21. 

The correlation between product moments of any order in samples from a normal population. 
(1929), Proc. Roy. Soc. Edinb. 49, 78-91. 

Fertiliser trials on the ordinary farm (with H. J. G. Hines). (1929), J. Minist. Agric. 36, 
524-32. 

A study in sampling technique: the effect of artificial fertilisers on the yield of potatoes (with 
A. R. Clapham). (1929), J. Agric. Sci. 19, 600-18. 

Studies in crop variation. VII. The influence of rainfall on the yield of barley at Rothamsted 
(with Winifred A. Mackenzie). (1930), J. Agric. Sci. 20, 417-39. 

The arrangement of field experiments and the statistical reduction of the results (with R. A. 
Fisher). (1930), Tech. Commun. Imp. Bur. Soil Sci. no. 10. 

A method of estimating the yield of a missing plot in field experimental work (with F. E. Allan). 
(1930), J. Agric. Sci. 20, 399-406. 

On the secular variation of rainfall at Rothamsted. (1930), Mem. R. Met. Soc. 3, 127-37. 

Fertiliser trials in 1929 (with H. V. Garner). (1930), J. Minist. Agric. 37, 793-802. 

The derivation of certain high order sampling product moments from a normal population. 
(1930), Biometrika, 22, 224-38. 

The analysis of variance illustrated in its application to a compley agricultural experiment on 
sugar beet. (1931), Arch. Pflanzenbau, 5, 561-84. 

The mean and second moment coefficient of the multiple correlation coefficient in samples from 
a normal population. (1931), Biometrika, 22, 353-61. 

The derivation of the pattern formulae of two-way partitions, from those of simpler patterns 
(with R. A. Fisher). (1931), Proc. Lond. Math. Soc. 33, 195-208. 

Notes on frequency-constants. (i931), J. Inst. Actu. 62, 174~7. 

Methods of field experimentation and the statistical analysis of the results. (1931), Rothamsted 
Conferences, x1m, 13-21. 

Interpolation without printed differences: Jordan’s and Aitken’s formulae. (1932), Math. Gaz. 
16, 14—25. 

The distribution of second order moment statistics in a normal system (with M. S. Bartlett). 
(1932), Proc. Camb. Phil. Soc. 28, 455-9. 

A note on the distribution of the correlation ratio. (1932), Biometrika, 24, 441-56. 

The generalised product moment distribution in a normal system (with M. 8S. Bartlett). (1933), 
Proc. Camb. Phil. Soc. 29, 260-70. 

A comparison of the semi-invariants of the distributions of moment and semi-invariant estimates 
in samples from an infinite population. (1933), Biometrika, 25, 52-60. 

A statistical analysis of the inter-relations of litter size and duration of pregnancy on the birth 
weight of rabbits (with J. Hammond). (1933), J. Agric. Sci. 23, 463-72. 

The theory of orthogonal polynomial fitting (review). (1933), J. R. Statist. Soc. 96, 487-91. 

Field experimentation: the modern technique. (1934), Agric. Progr. 11, 149-56. 

Statistics in agricultural research. (1934), J. R. Statist. Soc. Swppl. 1, 26-51. 

Bibliography of agricultural statistics, 1931-33. (1934), J. R. Statist. Soc. Suppl. 1, 94-106. 

Analysis of variance and analysis of covariance, their meaning and their application in crop 
experimentation. (1934), Report of 2nd Emp. Cotton Gr. Corp. Conf. pp. 83-9. 

Principles and Practice of Field Experimentation (with H. G. Sanders). (1935), London, Emp. 
Cotton Gr. Corp. 

The evolution of the field experiment. (1935), Proc. lst Plant Breeding Conf. Nanking, China, 
1934-5, pp. 25-30. 

Tests of significance in analysis of covariance. (1936), J. R. Statist. Soc. Suppl. 3, 79-82. 

Statistics in Chinese agricultural research. (1936), J. Amer. Statist. Ass. 31, 127-8. 

A theorem concerning the distribution of joins between line segments (with H. O. Hirschfeld). 
(1936), J. Lond. Math. Soc. 11, 227-35. 

The nutrition of the bacon pig. I. The influence of high levels of protein intake on growth, 
conformation and quality in the bacon pig (with H. E. Woodman et al.). (1936), J. Agric. Sci. 
26, 546-619. 








John Wishart 


Field experiments of factorial design. (1936), J. Agric. Sci. 28, 299-306. 

Growth-rate determinations in nutrition studies with the bacon pig, and their analysis. (1938), 
Biometrika, 30, 16—28. 

Crop estimation and its relation to agricultural meteorology. (1938), J. R. Statist. Soc. Suppl. 
5, 20-5. 

Statistical treatment of animal experiments. (1939), J. R. Statist. Soc. Suppl. 6, 1-22. 

Some aspects of the teaching of statistics. (1939), J. R. Statist. Soc. 102, 532-51. 

Field Trials: their Lay-out and Statistical Analysis. (1940), Cambridge: Imperial Bureau of 
Plant Breeding and Genetics. 

Mathematical Tables. Vol. VIII. Number-divisor Tables (by J.W.L.Glaisher). (1940). (Editorial 
work in completion of Tables for the British Association Tables Committee.) Cambridge 
University Press. 

‘Student’s’ Collected Papers. (1944) (edited with E. 8. Pearson for the Biometrika Trustees). 
Cambridge University Press. 

Note on the probability distribution arising in the study of the Instivute’s examinations. (1947), 
J. Inst. Actu. Stud. Soc. 6, 140-3. 

The cumulants of the z and of the logarithmic y* and¢ distributions. (1947), Biometrika, 34, 170-8. 

The variance ratio test in statistics. (1947), J. Inst. Actu. Stud. Soc. 6, 172-84. 

Proof of the distributions of y’, of the estimate of variance, and of the variance ratio. (1947), 
J. Inst. Actu. Stud. Soc. 7, 98-103. 

La estadistica en Inglaterra en relacién con las facultades universitarias de matematicas. (1947), 
Paper read to Consejo Superior de Investigaciones Cientificas, Instituto ‘Jorge Juan’ de 
Matematicas, April. 

Statistical aspects of demobilization in the Royal Navy. (1947), J. R. Statist. Soc. 110, 27-44. 

Proofs of the distribution law of the second order moment statistics. (1948), Biometrika, 35, 
55-7. 

Tests of significance in the simple regression problem. (1948), J. Inst. Actu. Stud. Soc. 8, 38-43. 

Test of homogeneity of regression coefficients, and its application in the analysis of covariance. 
(1948). Paper read at the Colloque de Calcul des Probabilités et de Statistique Mathématique, 
Lyon. (1949), Calcul des Prob. 93-100. 

The teaching of statistics. (1948), J. R. Statist. Soc. A, 111 (Opening contribution to the Dis- 
cussion), 212-7. 

Cumulants of multivariate multinomial distributions. (1949), Biometrika, 36, 47-58. 

Analisis de la Varianza y Covarianza. (1949), Madrid: Instituto Nacional de Investigaciones 
Agronomicas. 

Field Trials. II. The Analysis of Covariance. (1950), Cambridge: Commonwealth Bureau of 
Plant Breeding and Genetics. 

The variance-ratio distribution. (1951), J. Inst. Actu. Stud. Soc. 10, 258-9. 

Biological Experimentation. (1951), Incorp. Statistician, 2, 21-7. 

Moment coefficients of the k-statistics in samples from a finite population. (1952), Biometrika, 
39, 1-13. 

The combinatorial development of the cumulants of the k-statistics. (1952), Trabajos de Esta- 
distica, 3, 13-26. 

The teaching of statistics. (1952), Incorp. Statistician, 3, 3-6. 

Orthogonal polynomial fitting (with T. Metakides). (1953), Biometrika, 40, 361-9. 

The factorial moments of the distribution of joins between line segments. (1954), Biometrika, 
41, 555-6. 

Contribution to a discussion on the teaching of statistics at university level. (1955), J. R. 
Statist. Soc. A, 118, 194-5, 210-11. 

The variance-ratio test. (1955), Int. Statist. Inst., Statistical Seminar, Rome, 1953, 9-12. 

Multivariate Analysis. (1955), Applied Statistics, 4, 103-16. 

Principles and Practice of Field Experimeniation (with H. G. Sanders) (2nd ed. 1955). Common- 
wealth Bureau of Plant Breeding and Genetics. (See (38) above.) 

‘Significant Difference’ in the analysis of field trials. (1956), Agric. Rev. May 1956, (pp. 3). 

x? probabilities for large numbers of degrees of freedom. (1956), Biometrika, 43, 92-5. 








38), 


ypl. 


a, 





[9] 


RESTRICTED SEQUENTIAL PROCEDURES 


By P. ARMITAGE 


Statistical Research Unit of the Medical Research Council, 
London School of Hygiene and Tropical Medicine 


1. INTRODUCTION 


When sequential methods are advocated for industrial inspection work, the argument is 
usually advanced on grounds of economy. A given degree of discrimination between ‘good’ 
and ‘bad’ quality is achieved, by a sequential procedure, with a smaller average number of 
observations than that required by an equivalent non-sequential procedure. This may well 
be a relevant criterion when the sampling inspection is to be repeated frequently as a routine 
operation. In many fields of experimentation this argument carries little weight, because 
the uncertainty in knowing how long a particular sequential experiment will continue may 
outweigh any possible long-term economy. There is, however, a rather different reason 
why sequential procedures of some sort may be suitable for many trials in clinical and 
preventive medicine (which are essentially classical randomization experiments to compare 
different medical treatments, with human subjects as experimental units). These trials are 
usually sequential in the trivial sense that the subjects enter the experiment at different 
points of time, but this fact would not necessarily suggest sequential methods of design and 
analysis. The important point is that a doctor responsible for patients in a trial will frequently 
regard it as unethical to continue the trial if he is convinced that a difference between the 
effects of two treatments has appeared, because this would mean withholding from a patient 
under his care a treatment which he regarded as better than that being given. This con- 
sideration will be most cogent when life and death are involved; it may not apply at all if 
the illnesses being treated are fairly trivial, but it will usually be present to some extent. 
There is, therefore, a natural tendency for the organizers of a clinical or preventive trial to 
examine the observations at various points of time, and to stop or modify the trial if a 
difference between treatment effects is apparent. Sequential experimentation of this sort, 
however reasonable, will to some extent affect the validity of any probability statements 
involving integration over sample space (Anscombe, 1954), and it is natural to investigate 
the possibility of carrying out the trial according to some well-defined sequential procedure 
for which valid probability statements can be made. 

In an earlier paper (Armitage, 1954) I suggested that comparisons of two treatments 
could be made by fairly straightforward modifications of Wald’s methods. Means could be 
compared by two-sided sequential t-tests (National Bureau of Standards, 1951), and pro- 
portions could be compared by a two-sided analogue of Wald’s one-sided test for comparative 
trials. (The test for proportions has been proposed also by de Boer (1953).) These procedures 
retain the feature, common to all probability-ratio sequential tests, that the number of 
observations required before a boundary is reached is unlimited. This property is in practice 
unattractive, and although a Wald sequential procedure can be truncated the effects of 
truncation are not generally known. To preserve, even approximately, the nominal ‘risks 
of error’ the truncation must be applied after a fairly large number of observations, and 
the distributions of the sample number still have very high variability (for examples of 
such distributions, see Baker, 1950). Such variability discourages the use of this type of 


* 








- 
os & 


10 ‘Restricted sequential procedures 


procedure for medical trials, since it is clearly inconvenient to plan a trial the duration 
of which has an expectation of between, say, 2 and 9 months (depending on the actual 
difference between the effects of the treatments), but which may in an individual instance 
have to be continued for as long as 3 years. 

The type of procedure which seems to be required is one which incorporates truncation 
as an integral part of the scheme, which allows much less variability of sample number 
than do Wald schemes, and for which some at least of the probabilistic properties are known. 
Bross (1952) constructed some closed sequential designs for the comparison of two pro- 
portions, but although they satisfy the criteria stated above to be desirable, each design 
had to be constructed specially, to some extent by trial and error, and only two have so far 
been published. In the present paper I consider a class of closed sequential procedures, 
which I have called ‘restricted sequential procedures’. These seem to satisfy the above 
requirements, but their probabilistic properties depend on a diffusion approximation, the 
adequacy of which has not been fully investigated. The main feature of the method is that 
the sample number cannot be greater than some predetermined value, NV. As sampling 
proceeds, a sample path may be drawn on a diagram, and sampling will stop after less than 
N observations if one of two ‘outer boundaries’ is crossed. These outer boundaries are 
determined by the requirement that the likelihood ratio of either of two alternative hypo- 
theses, to the null hypothesis, takes some preassigned high value. It may be arranged that 
the probability, on the null hypothesis, that the sample path will stop on an outer boundary 
takes a predetermined small value, 2x. If the procedure is used to provide a two-sided test 
of the null hypothesis, sample paths reaching the outer boundaries will then correspond 
to results significant at the 2a level; those failing to reach the outer boundaries before NV 
observations will correspond to non-significant results. The greater the apparent departure 
from the null hypothesis, the sooner will one of the outer boundaries be hit, and sampling 
brought to a close—a feature which seems to accord with medical requirements. 

These procedures were developed with medical applications in mind, but they may be 
of some use in industrial or other situations. I have deliberately not referred to them ex- 
clusively as sequential tests, because the point of view adopted here is that a sequential 
procedure is primarily the specification of a stopping rule. If its probabilistic properties 
are sufficiently well known, a given stopping rule may be used to different ends: it may 
provide an acceptance procedure, in which case the appropriate formulation may be in 
terms of risks of acceptance or rejection; it may provide a test of a null hypothesis, in which 
case its power function may be of interest; it may be used for the estimation of an unknown 
parameter; or it may be used in more than one of these ways on the same data. 

For an account of a clinical trial conducted by these methods, see Snell & Armitage (1957). 


2. NORMAL DISTRIBUTION WITH KNOWN VARIANCE 
2-1. Suppose that observations x;(i = 1,2,...) are drawn at random from a normal 
n 
distribution with unknown mean yw and known variance o?. Let y, = > 2;. ‘Restricted 
i=1 
sequential procedures’ will consist in sampling until one of the three following boundaries 
is reached: 
(i) the ‘upper boundary’, U: y, = a+bn (a>0); 
(ii) the ‘lower boundary’, L: y,, = —a—bn (a>0); 
(iii) the ‘middle boundary’, M:n = N. 











P. ARMITAGE 11 


U and L will be called ‘outer boundaries’. In order that neither outer boundary crosses 
Y, = 0 for n<N we shall require a+bN >0. 

Suppose first that only U and M are present. If the discrete steps in n are replaced by 
continuous movement in time, where the time unit corresponds to a single observation, 
the random walk may be approximated by the one-dimensional diffusion process with drift 
yu —b per unit time, growth in variance at a rate 0”, and an absorbing barrier at a. Hence, the 
probability of crossing U with not more than N observations is approximated to by the 
probability of absorption before time N, which is given (Bartlett, 1946) by 


eee ee eee 


a /N 
where F'(w) -| (27)-4 edt. This formula is valid for a > 0, as is appropriate here. The 


—-o 





same approach to this problem is suggested by Page (1957). 

If L is now introduced, the probability of absorption on U is affected. The expression (1) 
should be reduced by the probability that the path of the diffusing, particle crosses first 
LI and then U before n = N. This effect appears to be less important with the convergent 
or divergent boundaries considered here than it is with the Wald type of parallel-line 
boundaries. 

For divergent boundaries (when b > 0), the probability that a path, starting at a point 
on L, will cross U before n = N, is less than it would be for a path starting at the point 
(n = 0, y, = —a). For such a path the probability of crossing the line y,, = bn before n = N 
is equal to P,(u, NV), and the conditional probability of then crossing U before n = N is 
less than P,(u,N). The total probability is therefore less than {P,(u, N)}*. But the prob- 
ability that a path starting at the origin will cross L before n = N is less than P,(—y, N). 
Hence, an upper bound to the probability that a path crosses L and then U before n = N is 
Py —», N) {Pj(4, N)}, and an upper bound to the proportionate error (i.e. the ratio of the 
error to the nominal value P(u, N)) is P(—yw, N)P(u, N). For particular choices of a, b 
and N this upper bound may not be very small, but in the applications discussed below 
it will be shown to be small. 

No analogous upper bound has been obtained for convergent boundaries, and in most of 
the following discussion we therefore assume that b > 0. 


2-2. The specification of the boundaries requires three arbitrary constants: a, b and N. 
In general, therefore, a set of boundaries will not be determined uniquely by the require- 
ment that the procedure should satisfy only one or two independent conditions. Consider 
first the problem of determining values of a, b and N satisfying the following two conditions: 

(a) On the hypothesis H, that ~ = 0, the probabilities that sampling will end on each of 
the outer boundaries are a< } (these probabilities clearly being equal by symmetry). If 
the procedure is used to provide a significance test of H,, a sample path ending on one of the 
outer boundaries may then be regarded as significant at a level 2«. 

(6) On the hypothesis H, that ~ = ",> 0, the probability of reaching U is 1—£ (>a). 
(By symmetry, 1—/ will also be the probability of reaching ZL on the hypothesis H_ that 

= — fy) 

Since only two restrictions have been imposed on the three constants a, b and NV, we should 
expect them to determine a family of restricted sequential designs. The approximate 
solution below provides one design out of this family. 








12 Restricted sequential procedures 


2 dee 
Choose je hs eee b=. (2) 


fy a 2 

Then, on U, the likelihood ratio of H, to H, is (1—/)/«. Whether Z is present or not, the 
probabilities, on H, and H), of reaching any specified set of points on U will be in the ratio 
1—£ to a (neglecting the overshooting of the boundaries). In particular, if P(u, N) is the 
probability of absorption on U in less than N observations, no other boundary having 
previously been reached, and we choose N such that P(y,, N) = 1—A, it will follow that 
(again neglecting the overshooting of the boundaries) P(0, NV) = «. Conditions (a) and (6) 
will then be satisfied. Approximating to P(u, N) by P,(u, NV), given by (1), we require 


p= r(MC= Aled AN) _(1=B) p(_W(—Aile)_AY) 


AJN 2 a | AJN 2 (3) 


where A = y,/o. Since «, # and A are known, (3) may be solved for NV, by successive approxi- 
mation. In fact, for given « and f, (3) may be solved for A./N; the solutions, A./N,, for 
various combinations of « and / are shown in Table 1. 


Table 1. Normal distribution with known variance : values of A./N,, 
for various values of « and B 





” 0-050 0-025 0-005 
B 
0-10 3-41 3 4-32 
0-05 3-92 22 4-80 
0-01 4-95 5-2 5-79 





The use of the single-boundary diffusion approximation (1) is justified since the upper 
bound to the proportionate error in « (and hence in 1 — /) is {P,(0, V)}* = «?. In practice 
a will usually be chosen to be fairly small (say 0-05 or less), since in the practical applications 
envisaged at the moment « may be regarded as the (one-sided) significance level at which 
one wishes to use less than the maximum sample number, NV. The other approximation 
involved—that of ignoring the overshooting of the boundaries—is of the same type as that 
involved in the formulae customarily used in Wald’s sequential tests. Wald’s formulae also 
can be obtained by diffusion theory (Bartlett, 1946), and it would seem a reasonable con- 
jecture that the accuracy of the approximation in the present schemes would be similar 
to that of Wald’s approximations. 

One other member of the family of restricted sequential procedures satisfying (a) and (b) 
is the degenerate case obtained by putting a = 00, b = —a«, N = Nanda+bN, = u,o JN, 


where F(u,)=1-—a, F(us) = 1-8, 
and AJM = U,+ Uz. 

This is the fixed-sample-size procedure satisfying (a) and (6). Since the procedure with 
N = N,, and the fixed-sample-size procedure with N = N, both satisfy (a) and (b), and since 


the former permits sampling to stop for n < N,, we should expect to find N, > Nj. The ratios 
N,/Np are given in Table 2 for various combinations of « and /. 

















P. ARMITAGE 13 
Other members of the family of procedures satisfying (a) and (b) could be found by trial 
and error, but no attempt to do this has yet been made. 


Table 2. Normal distribution with known variance: values of N,/No, 
for various values of « and B 





7 








‘ 0-050 | 0-025 0-005 | 
ain | | | 
0-10 1-355 1-313 1-252 
0-05 1-417 1-368 1-295 
0-01 1-553 1-490 1-393 | 





2-3. It might in practice be more convenient to specify in advance the maximum sample 
number N, rather than the probability £. We therefore consider the problem of determining 
values of a and b satisfying condition (a) as before, given that the maximum sample number 
is NV. Again, we should expect a family of procedures satisfying these conditions. 

Since a and N are known, equation (3) may be solved for A by successive approximation 
for any chosen value of /. Denote the solution by A(f). Each pair of values of (£, A(/)) 
determines the value of a and b. A family of procedures may therefore be generated by 
allowing f to assume different values. We shall here relax the condition required in § 2-2, 
that # < 1—«a. This is a natural requirement to impose if the boundaries are to be determined 
by considerations of power, but it is instructive to consider the wider class of boundaries 
obtained by allowing f to vary between 0 and 1. In § 2-4 we prove that A(/) is a single-valued 
monotonic decreasing function of , ranging from +0 to —oo as # ranges from 0 to 1. For 
fb s1-—«a, A(f)2 0. Hence, for all £, the solutions satisfy 


o 1-~£ 
a( ~ arin“) >, 


as required. (This result is intuitively plausible, since (1—/)/a is the likelihood ratio, at 
points on U, of the hypothesis that ~ = A(f)o to the hypothesis that ~ = 0. It is therefore 
reasonable that (1—/)/a>1 if A(#)>0 and (1—/)/a<1 if A(f#) <9.) It is also proved in 
§2-4 that a+bN>u,>0; the requirement of §2-1 that a+bN>0 is thus satisfied. For 
fB>1-—a, b(=cA(f)/2)<0 and the boundaries are convergent. Since, for convergent 
boundaries, no upper bound is available for the proportionate error involved in using the 
single-boundary diffusion theory, the procedures obtained for values of #>1—a are at 
present of less practical interest than those for #<1—a. Table 3 gives values of a/c, b/o 
and A(f) for various values of f£, in the particular case where a = 0-025, N = 50. These 
values were obtained by successive approximation. The corresponding upper boundaries 
are illustrated in Fig. 1. 

The limiting forms as f > 0 and A(/)-> © are of little practical interest, because when the 
slope of the outer boundary is very steep the total probability of hitting the upper boundary, 
a, is largely concentrated at very low values of n, and in these circumstances the diffusion 
process (which allows the boundary to be crossed for any n > 0) is a poor approximation to 
a process involving only integral n. 








14 Restricted sequential procedures 


The horizontal boundary obtained for £ = 1—« is of some interest, as procedures with 
one horizontal boundary have been discussed by Rao (1950). The value of a is indeterminate, 
but the limiting value, ap», is shown in § 2-4 to satisfy the equation 


F(ag/o/N) = 1—}a. (4) 


Table 3. Normal distribuicon with known variance: values of alo, b/o and A(f) for restricted 
sequential procedures with « = 0-025, N = 50, and various values of f (cf. Fig. 1) 








| | | | 
B | alo | blo | A(p) 
nies ee; See Se | ee Ee 
0-001 4-02 0-458 | 0-916 
0-010 | 4-97 0-370 | 0-740 
0-050 | 6-10 | 0-298 0-596 
0-100 | 6-82 0-263 0-525 
0-500 9-95 0-150 0-301 
0-900 13-7 0-051 0-101 
0-97 | 15:8 0-000 0-000 
0-980 16-1 | — 0-007 0-014 
0-990 17-0 | — 0-027 — 0-054 
0-999 19-5 ~ 0-082 — 0-165 








30 

















Yafo 





9:50 cael 
10 ~~“ 























o 


10 20 30 40 50 
Number of observations, n 


Fig. 1. Normal distribution with known variance: upper boundaries, U, for various procedures with 
a« = 0-025, N = 50. The boundaries are generated by the parameter /, the values of which are 
shown in the diagram. For = 1-0, the outer boundary is vertical, corresponding to a fixed- 
sample-size procedure. 


In the notation of § 2-2, ay = u,0,/N. The boundary therefore meets n = N at a point 
corresponding to a fixed-sample-size significance level of 4a, on a one-sided test. The upper 
bound to the proportionate error, derived for divergent boundaries, is readily seen to be 
valid also for horizontal boundaries. Moreover, an exact expression is available for the 























P. ARMITAGE 15 


distribution of absorption time in the diffusion problem with two parallel boundaries 
(e.g. Feller, 1950, p. 304, Problem 7). In this particular case the upper bound for the 
proportionate error appears to be unnecessarily wide. For instance, if the boundaries are 
set at y,, = +o, where dy is given by (4) with a = 0-025, the probability, on H), of absorption 
before n = N falls short of 0-05 by less than 10-%. 

Consider now the effect of replacing the diffusion process by discrete sampling, in the 
situation considered by Rao. Then, if U is horizontal and L is omitted (as in Rao’s pro- 
cedure), a (defined by (4)) provides a rather close upper bound to the exact value A, 
required to satisfy given values of « and NV. For, if the equation of U is y,, = a), any sample 
path which, if continued to n = N satisfies yy >a), must cross U at some n< N. Consider 
a path which first crosses U at n = ny< N. On hypothesis H,, the probability that such a 
path, when continued to n = N, will satisfy yy > dp, is slightly greater than 4, for all mp. 
(It would equal } if the paths did not overshoot the boundary.) If the probability, on H), 
of first crossing U at n = ny is a(n,), then the probability that yy > a, is slightly greater than 

| N-1 


5) pe a(N)+a(N), 


“no=1 
eat 1N 
which is slightly greater than 5 X(N). 
4% 


But, by the definition of a), the probability that yy >a, is 4a. Hence 


N 
a> >» a(Np), 
1 


and, in order to replace the inequality by an equality, a, must be reduced slightly. Hence 
a) provides a fairly close upper limit to A,, rather better than that given by Rao (1950), 
as may be seen from Table 4. 


Table 4. Normal distribution with known variance: procedure with single boundary at y,, = Ag, 
restricted atn = N. Upper bounds for A,/o ./N, for various values of « 











| Upper bound given by 
a 
Uyq = A/0 JN Rao, p. 365 
0-050 1-9600 2-0626 
0-025 2-2414 2-337 
0-010 2-5758 2-6655 

















The other limiting form, obtained as A(?)->—oo and #1, is of course a fixed-sample- 
size procedure, with the upper boundary consisting of those points significant at a prob- 
ability level a, on a one-sided test. This is proved in § 2-4. It may be noted that in this case 
the proportionate error introduced by using the one-boundary diffusion approximation is 
zero, even though b< 0. 

The choice of a restricted sequential procedure, for given values of a and N, is almost 
embarrassingly wide, and seems to depend on how ‘sequential’ we wish the procedure to 











16 _ Restricted sequential procedures 


be—to what extent, that is, we wish the sample number to be reduced below the maximum 
of NV, as the mean, %, increases. At one extreme we have the fixed-sample-size procedure, in 
which the sample number is always NV. At the other extreme we have procedures, based on 
low values of # and high values of A(f), which terminate very much earlier for fairly high 
values of . It seems likely (but has not been proved) that, for given a and N, the procedures 
based on the higher values of / (i.e. the less ‘sequential’ ones) are the more powerful, in 
the sense that for a given value of « > 0 the probability of crossing the upper boundary is 
greater. In the absence of any exact formulation of risk functions, it would seem reasonable 
(judging from Fig. 1) to use procedures based on values of # between 0-001 and 0-50. 

If the given value of « is one of those used in Table 1 (0-005, 0-025 or 0-05), the boundaries 
of three procedures (based on / = 0-01, 0-05 and 0-10) are easily found, for division of the 
tabulated value by ./N gives A(/), and, from (2), 

o 1-£ oA(f) 
= Ape ars 6 = ———. 


2-4. We now prove the results stated in § 2-3. We have first 


a 





THEOREM 1. For given values of « and N, the solution A(f) of (3) is a single-valued function 
of B, decreasing monotonically from +00 to —co as P increases from 0 to 1. 
Consider the function 
_ p(In{-A)les AVN =) In{(1—A)/a} AVN 
sn 84) (A M8. 
(3) is satisfied if and only if ¢ = 0. 
From (5), after some reduction 


ab | pa-m™(G) ifn) a 
2 


AJN 2 


oA =| aaN | A? 
= Oaccording as 6 = 1-—a. 


As a function of A, ¢ has a discontinuity at A = 0. Consideration of the limiting values of 
das A->+o0and +0, and of the sign of 0¢/0A, shows that ¢(A) assumes one of two distinct 
forms, according as f<1—a or f>1—«a. 

For £ < 1—a, ¢ decreases from (1 — /) (1—1/a) to (1—) (1 —1/a)—1 as A increases from 
—c to —0, and from 1—f to —/ as A increases from +0 to +00. 

For />1—a, ¢ increases from (1—/) (1—1/a) to 1—f as A increases from —oo to —0, 
and from (1 —/)(1—1/a)—1 to —f as A increases from +0 to +00. 

The solution A(f) of (3) must therefore be a single-valued function of /, satisfying 


A(Z)=0 as BS1l-a. 
To prove the monotonicity of A(/), consider 04/08. We find, after some reduction, 


1—f\? 1-~ 
+ 2}! 1{(=*3*) aewl tae 8 ant) 








op A\ma(i—p)yN| “P32 or. ae Bee ANt 2 (6) 
int =F ANa 
For A>0, f<1—a, “= — TR idee 


ANt 2 














P. ARMITAGE 17 
Hence, using the inequality on Mills’s ratio given by Gordon (1941), 
F(u) < —u-1(27)-4 e- 
< (2A-1N-4) (277)-4 e-4* 
1-£\? 
* a 2a ) {tn ) sen} 
A\na—AyN) “P-2\ aw tad: 


Hence 0¢/0f < —1, and from previous remarks about the behaviour of ¢ it immediately 
follows that for @<1—a and A(f)>0, A(f) is a monotonically decreasing function of £, 
the limit of which, as +0, is +00. 

For A<0, #>1-—«a, 04/08 cannot have the same sign for all A, since the upper and lower 
limits, 1 — # and (1—/)(1—1/a), converge as S>1. But, for d <0, by (5), 





We bes 
1 a AN# 1 B 
ye eee 7) 
1-£ 
- a ANt 
where v=— — > 0, 
AN? 2 


and, by Gordon’s inequality, 
F(v) = 1—F(-v)>1- v-1(277)t e-3* 


+(% =! exp— 1 (ea), sew} 


1 
7 *; maN , 2 








Hence, the left-hand side of (7) 


, {te 8 vs 


2 4 
>1+5(raa—ayy) Pa 
and, from (6), 06/08 >. 


Hence, and from previous remarks about the behaviour of ¢, it follows that for #>1-—a 
and A(f) <0, A(f) is a monotonically decreasing function of /, approaching the limiting 
value of —0o as f>1. 

This completes Theorem 1. We have not strictly proved, as part of the theorem, that 
A(f)+>0 as B>1—«a. It is easily seen, writing 1—/ = a(1+4) in (3) and letting +0, 
that a solution of (3) is possible only if A(#) = O(64). 

To Theorem 1 we have the following 


CoROLLARY. Given N, any restricted sequential procedure satisfying P,(0,N) =a (ie. 
having a specified risk according to the diffusion approximation) is one of the family generated 
by the variable p. 

For, the slope 6 of the boundaries determines a value of A(f), and hence of /, and the 
corresponding boundaries must coincide with those given, since a must be a single-valued 
function of P,(0, N). ; 


m Biom. 44 








18 Restricted sequential procedures 


The remaining results stated in § 2-3 are included in 
THEOREM 2. The ordinate, a+bN, at the point of intersection of U and M, has a lower limit 
of u,0/N, approached as B>1. As B>1—a,a+bN>w,0,/N. 


aoe 
a+bN a ~~ A(B)JN 
oJN  A@)yN" 2’ 


a+bN —a—bN' 
and, from (3), 1- F(T) = (Far) 


moet 4 
pte ad _ AB) VN ey 
1-£, A(f) JN 2 


<a. 


From (2), 











Hence (a+ bN)/o./N has a lower bound of w,, and tends to wu, as 





ie. as B>1, and A(f)>—-o. 

The second part of the theorem follows from the result indicated at the end of Theorem 1, 
that as G>1—a, A(f) = O(6), where 1—£ = a(1+6). Then (a+6N)/o./N><d/A(f) JN, 
and (3) becomes 

1—a~ F(d/A(f)./N)—(1 +8) F(—3d/A(Z) J) 
~ 2F(d/A(P)./N)-—1. 


Hence F'(d/A(f)./N) ~ 1 — 4a, and the result follows. 


3. COMPARATIVE BINOMIAL TRIALS 


3-1. Wald (1947) proposed a sequential method for testing the difference between the 
means of two binomial distributions, the non-sequential analogue of which is asymptotically 
fully efficient. Suppose that individuals are to be submitted to one of two treatments, 
T, and T,, with ‘success-rates’ 7, and 7, respectively. Pairs of individuals are allocated 
randomly to 7; and 7, and the failure or success of each member of the pair is recorded. 
A pair showing success on 7’; and failure on 7, will be denoted by SF, and so on. Following 
Bross (1952), we shall use the term ‘tied pairs’ for those recorded as SS or FF, and ‘untied 
pairs’ for those recorded as F'S or SF’. Wald’s method is based entirely on the untied pairs. 
Let 0 be the probability that an untied pair is an FS. Then 

0 _ (=m) m 
6= ts according as 7, = 7,,andtheproblem of testing the difference between the two binomial 
parameters 77, and 7, reduces to that of testing the departure of the single binomial parameter 
@ from the value }. 

We follow here the same basic approach to the problem of comparing two binomial 
parameters 7, and 7,, and consider restricted sequential procedures for sampling the popula- 
tion of untied pairs, of which proportions # and 1 —@ are of types F'S and SF respectively. 








an at toh 


=~ Fs me © 465 ee eee 














P. ARMITAGE 19 


3-2. Suppose that, of the first n untied pairs, n, are of type FS and n, of type SF. Let 
Yn = 2ny—N = N,—Ng. By analogy with the problem treated in § 2-2, we shall first consider 
the specification of upper and lower boundaries U and L, and a middle boundary M with 
equation » = N, such that 

(a) on the hypothesis H, that 6 = }, the probabilities that sampling will end on each of 
the outer boundaries are « < 4; 

(6) on the hypothesis #,[H_] that 6 = 0,>4[@ = 1—4, <4], the probability of reaching 
U[L] is 1—£ (>a). The logarithm of the likelihood ratio of H, to H, is 


L = hy, log {0,/(1—9,)} —nlog {307 4(1 i 0,)4}. 


As in § 2-2 we arrange that, on U, 1 = log {(1—/)/a}. This gives the equations to U and 


Las y, =a+bnand y, = —a—bn, respectively, where 
_ 2log {(1~A)/a} - 
~ log {6,/(1-9,)} 
41 —6,)-3 
and - wan hah } 


log {9,/(1 —9,)} 


The requirements (a) and (6) will both be satisfied (apart from neglect of the overshoot of 
the boundaries) if we choose N to satisfy (5). 

Now, for sequential sampling from a binomial population with any boundaries, the 
probability of reaching a particular boundary point can be calculated exactly by enumera- 
tion of the number of paths from the origin, reaching that point without previously crossing 
a boundary. The required value of N could therefore be obtained by straightforward, 
although laborious, calculation of the probabilities, on H.,, of reaching the various boundary 
points on U. It will be useful, however, to have a manageable approximation to the value 


of NV, and this can be found by applying to the binomial random walk the diffusion approxi- 
mation already used in § 2. 


n 
We have y,, = > 2;, where the x; are independent variates taking values + 1 and — 1 with 
i=1 


probabilities 9 and 1—06 respectively. The random walk will therefore be approximated 
by a diffusion process with drift ~—b per unit time, growth in variance at a rate o?, and an 
absorbing barrier at a, where 


f= 20-1 and o? = 40(1—6). 


Requirement (b) will be approximately satisfied then, if (from (1) and the subsequent 
discussion) we ensure that 


—m,N)\ _ 2am, a-—m,! 
p= F( ON ; *) xP (“Fe “| r(= o,4N (9) 
where a is given by (8), of = 40,(1—9,), 
and m, = 20,-1 aGiny 


~ Inf6,/(1—4,)} 








see 


20 Restricted sequential procedures 


Table 5 gives the solution NV = N, of (9), and the values of a and 6, for « = 0-025, 8 = 0-05 
and various values of 4. 

As in the corresponding problem considered in § 2-2, the solution tabulated in Table 5 
provides only one member of a family of procedures approximately satisfying (a) and (0). 
(The discreteness of the variate x prevents an exact solution being found in general.) Another 
member of the family is a fixed-sample-size procedure with N = Nj, say. The choice amongst 
neighbouring values of N, each providing approximations to « and # may not be clear, but 
in practice suitable values may be obtained from the following normal approximation: 


Ny = {}(1 A 6,)8 Up - bu,}?/(A; = 3), 
where, as in § 2-2, F(u,) =1—a, F(u,)=1-£. 
Values of N, are given in Table 5. Values of N,/N, are shown in the last column of Table 5; 


for #7, near 0-5, the ratio is close to the normal equivalent 1-368 given in Table 2, as might 
be expected. 


Table 5. Procedure for comparison of two proportions. Parameters of boundaries ; maximum 
number of untied pairs, N, ; with neares’ boundary point at Nj and equivalent fixed-sample- 
size, N,: for « = 0-025, 2 = 0-05 and various values of 6, 











Parameters of Maximum number 

| boundaries of untied pairs | Equivalent 

| 6, | fixed-sample- N,/No | 

ace ~| size, Ny 

| | a b N, Ni 

Pe ny Rae eS! eee a a", Sake TR Jet dle. 

| | | 

| 0°55 | 36-25 00501 1778 1778 1294 1-37 
0-60 17-94 0-1007 438 439 319 | 1-37 | 

} 0-65 | 11-75 | 0-1524 192 191 138 1-39 | 
0-70 | 8-59 0-2058 105 104 75 1-40 | 
0-75 662 | 0-2619 65 66 460C«‘|(C(ité‘iddKM 
0-80 525 | 0-3219 43 44 ea 1:43 | 
085 | 4-19 0-3882 30 30 20 | 150 | 
0-90 3-31 | 0-4650 21 22 14 1:50 
0-95 2-47 | 0-5640 14 | 15 9 | 1:56 | 





The outer boundaries U and L can be crossed only at a discrete set {MW} of values of n, 
and it would seem reasonable to place M at n = Nj, where N; is the member of {./} nearest 
to N,. Values of Nj are given in Table 5. 

A second modification is to replace M by the wedge-shaped boundary M’ illustrated in 
Fig. 2. Any path crossing M’ must also, if continued, cross M rather than U or L. Hence, 
the replacement of M by M’ does not affect the probabilities of reaching any boundary 
point on U or L, whereas the average sample number for any 6 must be reduced. 


3-3. As in the similar problem considered in § 2-3, we might wish to specify in advance 
the maximum number of untied pairs, V, in addition to the probability a. As in § 2-3, we 
could generate a family of procedures satisfying these requirements, by solving (9) for 4,, 
for any chosen value of £. This has not been carried out, but for « = 0-025 one member of 











—_— 

















~ 





P, ARMITAGE 21 


the family (corresponding to # = 0-05) can be obtained from Table 5, by interpolating for 
the specified value of N in the column headed J,. 

It would perhaps be possible to prove results analogous to those in § 2:4, but such results 
have not yet been obtained. 


3-4. The theoretical basis proposed above, for restricted sequential procedures to com- 
pare binomial variates, involves a number of approximations: the neglect of the effect of 
one outer boundary oa the probabilities of reaching the other; the use of diffusion theory 
to represent a discrete sampling process; and the application of normal distribution theory 
to a problem involving a binomial variate. As a check on the validity of the theory, some 
exact probabilities have been calculated for the procedure tabulated in Table 5, with 
a = 0-025, £ = 0-05, 0, = 0-8, N, = 44. The boundaries, including the modified middle 
boundary of M’, are shown in Fig. 2. The probabilities of reaching the various boundary 
points, for @ = 0-5 and 0-8, are given in Table 6. 


20 | __o_ 











' 

‘ 

' 

: Number of 
0 20 30 40 T untied pairs, n 

' 

1 

i 

1 





saat 


tees 





] 


Yn, number of FS pairs minus number of SF pairs 
° o 
i. 


E, 
j 
Ly 

















-20 





Fig. 2. Comparison of two proportions: boundaries for procedure with a = 0-025, £ = 0-05, 
6, = 0-8 and N = 44. Boundary points shown by circles. 


The exact probabilities were calculated by enumeration of paths (Stockman & Armitage, 
1946), for various values of 0. As in (9), the diffusion approximation gives, for the probability 
of crossing U before n = n’, 


a—mn’ 2am’ —a—mn' 
1-F( on’ ) soup (Sr) "(Sa ). (10) 


where a= 5-248, o% =46(1-—0), and m= 20—1-3219. 





Table 6 compares the theoretical and actual probability distributions on the thirteen 
boundary points of U, for 9 = 0-8 and 0-5. The cumulative distributions are shown in 
Fig. 3. In these comparisons the theoretical probability of crossing between two adjacent 


22 


boundary points nj and n, is taken as an approximation to the actual probability for the 
higher value, n,; the cumulative distributions therefore give the theoretical and exact 
probabilities of crossing U in not more than n’ observations, for various n’. The agreement 
seems to be fairly good for 0 = 0-8, although there is a noticeable excess of theoretical over 
exact values for 0 = 0-5. This is to some extent the effect (familiar in Wald’s procedures) 


Table 6. Procedure for comparison of two proportions, with a = 0-025, 8 = 0-05 and 6, = 0-80. 
Probability distribution on each boundary, for 0 = 0-5 and 0-8, together with the theoretical 


Restricted sequential procedures 


approximation to the distribution on the upper boundary, U, given by (10) 


Probability of crossing boundary at n’th untied pair 
































6=0°5 6=0°8 

| 

4 dl dari U bound: 
n Upper and lower boundaries pper boundary | en 

boundary 

Exact Theoretical Exact Theoretical — 

cae ‘ | 

| | 
8 0-00391 0-00871 0-1678 0-1366 | 0-05 00 
11 0-00391 | 0-00565 0-1718 | 0-1515 00 
14 0-00317 | 0-00457 0-1429 | 0-1454 00 
17 0-00244 0-00351 0-1126 | 0-1230 00 
20 0-00185 0-00266 0-0873 | 0-0987 | 00 
23 0-00140 0-00202 0-0675 | 0-0774 00 
26 0-00106 | 0-00154 0-0523 0-0600 | 00 
29 0-00080 | 0-00118 0:0407 0-0464 | 00 
32 0-00061 0-00091 0-0318 | 0-0358 | 00 
35 0-00047 0-00070 0-0250 0-0276 | 00 
38 0-00036 0-00054 0-0197 | 0-0213 01 
| 4] 0-00028 0-00043 0-0156 | 0-0165 | 07 
44 0-00022 | 0-00034 0-0124 | 0-0129 | 0-05 26 

= | | 

| | 

Total, n’< 44 0-02047 0-03275 0-9473 | 0-9532 6-05 34 








Total, 
26<n'<44 








Middle boundary 





Middle boundary 








(exact) (exact) 
= —_———_ 
0:0090) sor | 0-0270 
0-0824} “PP ion | {oor 
0-3108 | 0:0053 
0-1546 | 0-0005 
0:3108) 1 or | 0-0001 
0-0824}  oetion | 0-0000 
0-0090 . 0-0000 
| 
| 
0:9590 | 0:0526 
| 











~~ 








the 


nt 
yer 








P. ARMITAGE 23 


of the overshooting of the boundaries, which ensures that at each boundary point the 
likelihood ratio of H, to H, exceeds the nominal value of (1 — /)/a.* 

The total probabilities, theoretical and actual, of reaching U for various values of 0 are 
shown in Table 7. The agreement is again fairly good. Table 7 gives also the exact avers ze 














0-05 r 10- 
0:04 ost 
> 603 0-6 
5 
5 
2 
2 0-02- 0-4- 
0-01} 02- 
i ! L ! 3 oS 1 1 i j 
0 10 >. 40 0 10 20 30 40 
n nv 
(a)6=0°5 (b) 6=08 


Fig. 3. Comparison of two proportions: procedure with a = 0-025, 8 = 0-05, 6,= 0-8 and N = 44. 
Probability of reaching upper boundary with not more than n’ untied pairs, (a2) when @ = 0-5 
and (b) when #= 0-8. The step function shows exact values, and the continuous function the 
theoretical diffusion approximation. 


Table 7. Procedure for comparison of two proportions, with a = 0-025, 8 = 0-05, 0, = 0-80, 
and N = 44. Exact and theoretical probabilities of reaching U, and exact average and 
variance of sample number, for various values of 0 




















| . . ° 
Probability of reaching U eee 
0 
Exact Theoretical Mean Variance 

0:50 0-0205 0-0327 29-21 19-83 
0-60 0-1490 0°1850 30-09 46-06 
0-70 0-5623 0-5860 27-90 115-55 
0-80 0-9473 0-9532 18-81 97-2 

0-85 0-9941 0:9965 | 14-46 49-42 
0-90 | 0-9999 1-0000 | 11-43 18-80 
0-95 1-0000 1:0000 | 9-41 5°57 











and variance of the sample number for various values of 0; these relate to the procedure 
with the modified boundary, M’. The ASN curve is shown in Fig. 4, together with the 
approximate ASN curve for a two-sided modification of a Wald sequential procedure with 
the same nominal values of «, £ and 6,, and the equivalent fixed-sample-size of N, = 30. 

* A very much better approximation to the probabilities for 0 =0-5, is obtained by calculating the 
likelihood ratio of H, to Hy at each boundary point, and dividing the theoretical probability for 
6 = 0:8 by this ratio. 








24 Restricted sequential procedures 


(In the fixed-sample-size procedure with N, = 30 the boundaries U, M and L correspond 
to the results n, >21, 10<n,< 20, and n, <9, respectively; the probabilities of reaching 
U on H, and H, are then 0-021 and 0-939, which agree reasonably well with the corresponding 
values in Table 7.) 


3-5. The procedures described above may be used for other situations in which repeated 
qualitative comparisons are made between two treatments. For example, two medical 
treatments may be placed in order of effectiveness for each of a number of subjects. An 
‘untied pair’ would then correspond to a subject giving a definite preference for one or 
other treatment, and n, and n, would be the numbers of preferences in favour of the first 
and second treatments, 


30 


N 
oO 








= 
oO 





Average sample number 





a 
tL 




















0 02 0-4 - 06 08 1-0 


Fig. 4. Comparison of two proportions: average sample number (ASN) of untied pairs as a function 
of #, for three procedures with approximately a = 0-025, # = 0-05, 6, = 0-8. —-—--- Two-sided 
Wald scheme (Armitage, 1954; ASN for = 0-5 from de Boer’s (1953) method, other values 
approximated by formula for one-sided scheme (Wald, 1947)). ——— Restricted sequential pro- 
cedure with N = 44 (exact values). ------ Fixed sample-size procedure with N, = 30. 


4. NORMAL DISTRIBUTION WITH UNKNOWN VARIANCE 


A restricted procedure based on the sequential t-test could theoretically be obtained by 
using the upper boundary of a two-sided sequential t-test (National Bureau of Standards, 
1951), and terminating the procedure after a fixed number, N, of observations, choosing 
N insuch a way that the probability, on the null hypothesis, of reaching the upper boundary 
took some specified value. In practice, no method of determining N appears to be available, 
since nothing is known of the distribution of sample size on sequential ¢ boundaries. 

Suppose that observations x; are drawn from a normal distribution with mean y and 
variance o*. The sequential procedure is based on the statistic 


n 2 n ‘. 
! n= (3a) [( E24). 
i=1 i=1 








THR 


— —_ - A 





n 


d 





————— 


P. ARMITAGE 25 


Suppose that a restricted sequential procedure is required, with an upper boundary U and 
a terminating boundary M (at n = N), satisfying the conditions: 

(a) on the hypothesis H, that ~ = 0, the probability that sampling will end on U is 2a; 

(6) onthe hypothesis H,[H_]that ~/o = A>O[u/o = —A< 0], the probability of reaching 
U is 1-f. 

If U is chosen to be the upper boundary of a sequential t-test, with risks 2a and £ and the 
appropriate value for A, the probabilities required by (a) and (b) will be approximately in 
the correct ratio. In the absence of any sound method of determining NV, one might determine 
the size of the equivalent fixed sample, say Nj, and conjecture that N/N, takes approxi- 
mately the same value as when o? is known. This value is given in terms of « and / in Table 2. 


5. SIGNIFICANCE TESTING AND ESTIMATION 


In the literature on sequential tests of significance, it has almost invariably been assumed 
that the object of a significance test is to state whether or not a null hypothesis is disproved 
at some fixed level of probability (say, 5°). In fixed-sample-size tests, however, it is 
usually regarded as more useful to indicate the exact probability at which. a particular 
result is just significant; that is, the probability of obtaining, in repeated sampling under 
specified conditions, a result at least as extreme (in some sense) as that observed. There 
seems no reason why this practice should not be followed with sequential experimentation, 
provided that the boundary points can be ordered, in some reasonable way, in terms of the 
apparent deviation from the nuli hypothesis. 

In the type of sequential procedure considered in this paper, such an ordering can be 
made with little ambiguity. Suppose that the procedure described in § 2 is used to test the 
null hypothesis that ~ = 0. If there were no overshooting of the boundaries the sample 
mean, %, when the procedure terminated, would increase monotonically as the boundary 
point moved round the boundaries from low to high values of n on L, along M, and from high 
to low values of n on U. If | Z| is used as an estimate of departure from the null hypothesis, 
the probability level corresponding to any boundary point, on a two-sided significance test, 
will be the probability (on the null hypothesis) of reaching boundary points with higher 
values of | | than that observed. 

Some ambiguity is caused by the discrete steps in n, but it would be easy to formulate 
reasonable ordering rules. For example, if the procedure terminated at a sample size n’, 
the possible pairs of (n’, Z) could be arranged in decreasing order of significance as follows: 
(i) all results with n’ = 1, in decreasing order of | % |; (ii} all results with n’ = 2, in decreasing 
order of |%|; and so on. Alternatively, results could be arranged in order of |%| without 
reference to n’. 

In the binomial procedures of §3, also, slight ambiguity is caused by the overshooting 
of the boundaries, so that the proportion of /’S pairs at the termination, 6, may not always 
increase monotonically as the boundary point moves from one extreme to the other. The 
simplest rule here would be to arrange the points in terms of the natural order round the 
boundaries. Thus, on the outer boundaries, smaller values of »’ would represent more 
extreme results, and on the modified middle boundary the results would be ordered 
according to \O-4 |. As is shown by Table 6, the diffusion theory may not provide an 
adequate approximation to the required probability, and it would be preferable to calculate 
it exactly. 











26 Restricted sequential procedures 


This approach may be applied to other types of sequential procedure, in particular to 
Wald sequential tests, provided that the probabilities, on the null hypothesis to be tested, 
of reaching the individual boundary points are known. 

If these probability distributions are known it is possible in principle to examine the 
properties of point estimators of unknown parameters, and to formulate rules for obtaining 
confidence intervals. In general, intervals obtained from the usual fixed-sample-size 
formulae will not yield the nominal confidence coefficients in repeated sampling with sequen- 
tial stopping rules. Some numerical investigations on a number of sequential procedures 
for binomial variates will be reported in another paper. 


I am indebted to Miss I. Allen for computational help. 


REFERENCES 


ANSCOMBE, F. J. (1954). Fixed-sample-size analysis of sequential observations. Biometrics, 10, 89-100. 

ARMITAGE, P. (1954). Sequential tests in prophylactic and therapeutic trials. Quart. J. Med. 23, 
255-74. 

Baker, A. G. (1950). Properties of some tests in sequential analysis. Biometrika, 37, 334-46. 

Bart ett, M.S8. (1946). The large-sample theory of sequential tests. Proc. Camb. Phil. Soc. 42, 239-44. 

Bross, I. (1952). Sequential medical plans. Biometrics, 8, 188-205. 

De Borr, J. (1953). Sequential test with three possible decisions for testing an unknown probability. 
Appl. Sci. Res. B, 3, 249-59. 

Fewier, W. (1950). An Introduction to Probability Theory and its Applications. New York: Wiley. 

Gorpvon, R. D. (1941). Values of Mills’ ratio of area to bounding ordinate and of the normal prob- 
ability integral for large values of the argument. Ann. Math. Statist. 12, 364-6. 

NATIONAL BuREAU OF STANDARDS (1951). Tables to Facilitate Sequential t-Tests. Applied Mathematics 
Series 7. Washington: Government Printing Office. 

Paaeg, E. 8. (1957). On problems in which a change in a parameter occurs at an unknown point. 
Biometrika, 44, 248-52. 

Rag, ©. R. (1950). Sequential tests of null hypotheses. Sankhyd, 10, 361-70. 

SNELL, E. 8. & ArmiraGe, P. (1957). A clinical comparison of diamorphine and pholcodeine as cough 
suppressants by a new method of sequential analysis. Lancet (in the Press). 

Stockman, C. M. & Armiracn, P. (1946). Some properties of closed sequential schemes. J. R. Statist. 
Soc. Suppl. 8, 104-12. 

Watp, A. (1947). Sequential Analysis. New York: Wiley. 














[ 27 ] 


ON THEORETICAL MODELS FOR COMPETITIVE AND 
PREDATORY BIOLOGICAL SYSTEMS 


By M. 8. BARTLETT 


University of Manchester 


1. PREAMBLE* 


Differences of opinion are obviously possible on the degree to which admittedly over- 
simplified theoretical models can explain some of the complex observational phenomena 
to be found in nature. Criticisms from biologists of the mathematical work of Lotka (1925), 
Volterra (1926) and subsequent writers on the growth and interaction of biological popula- 
tions have, however, sometimes been justified and sometimes unjustified, for in spite of 
inevitable limitations such work constitutes a permanent contribution to the understanding 
of how populations may behave. A significant constructive survey was made by Gause 
(1934) when he attempted to bridge the gap between theoretical models and natural bio- 
logical phenomena by controlled laboratory experiments in animal ecology. While experi- 
ments also have limitations as representations of nature, the role of both theory and 
experiment in the physical sciences might be recalled by any biologists inclined to be 
sceptical of the value of either. 

The interrelation of these approaches in biology may be illustrated in the field of epi- 
demiology. Here the vicissitudes of infected populations have been studied in the laboratory 
as well as in the field; but, as I have emphasized elsewhere (Bartlett, 1956, 1957), the pro- 
perties of theoretical models indicate, among other things, the extent to which population 
size may sometimes be crucial in the probable sequence of events, and thus indicate to what 
extent laboratory observations will have any similarity to larger-scale field observations 
even if the same model applies to both. An essential point is that recent theoretical formula- 
tions explicitly recognize the discrete character of populations and the stochastic or random 
aspect of changes, as distinct from strictly deterministic formulations. The need for this in 
ecology, which was already perhaps envisaged by Gause (1934, p. 124), is quite apparent 
in the experiments by Park with the flour-beetle T'riboliwm, in which one of two species 
together in a container survived not every time, but with a definite probability (e.g. 30% 
of times), that could be estimated by replication and changed by changing the environment 
(see, for example, Neyman, Park & Scott, 1956). 

There is now no mathematical difficulty in the formulation of stochastic models (see, for 
example, Bartlett, 1955a, 1956), and such formulations have already been made for typical 
ecological models by Chin Long Chiang (see Kempthorne e/ al. 1954). The greater intract- 
ability of even the simplest of these is, however, a serious obstacle to progress, especially 
in animal ecology, where even in the deterministic formulations of Lotka and Volterra many 
simplifications, such as neglect of age structure or of other heter~zeneity, were made. One 
aim of the ensuing discussion is to indicate the enhanced value of deterministic formulations 
of population dynamics when properly interpreted within more comprehensive stochastic 

* Some of these introductory comments were also included in a survey paper, ‘Some applications of 


probability and statistical theory in biology’, given at the Third Soviet Mathematical Congress held in 
Moscow in 1956. 








28 Competitive and predatory biological systems 


models. After some remarks on the logistic model of population growth for a single species, 
two problems are referred to: (i) the classical Lotka—Volterra predator-prey relation, 
(ii) competition between two species, with special reference to the competition between 
the two species of flour-beetle, T'ribolium confusum and T'ribolium castaneum, as investigated 
by Park. 


2. THE LOGISTIC MODEL 


The properties of the logistic law of growth, introduced by Verhulst (see, Andrewartha 
& Birch, 1954) and much discussed in the literature, need not be recapitulated here. In its 
deterministic form, the rate of increase of a population of size N is assumed to be 


DN = aN —-£N?, (1) 


where D=d/dt denotes differentiation with respect to the time ¢. It is well known that this 
simple equation ignores deterministic oscillations arising from age structure, and in a 
complete discussion it would be advisable to investigate the amplitude of oscillations in 
stochastic formulations, due to such a complication, along the lines to be developed for the 
simpler case above. However, the main point I wish to make is that the rate of increase 
given in (1) is a net balance of births and deaths, and many different stochastic models 
compatible with (1) are possible (cf. Kendall, 1949). In the extreme case deaths are negli- 
gible compared with births, and the chance of a birth in time d¢ (independently of previous 
events) will be assumed to be («aN — BN?) dt + 0(dt).* The difference between this stochastic 
model and (1) has been already studied in some detail (see Feller, 1939; Bailey, 1950). 
The asymptotic value for the population size is in this case still fixed at a/f. But if we 
assume that the chance of a birth in time d¢ is (a, N — £, N*) dt and of a death (aN + £,.N*) dt, 
then we have four coefficients connected by the relations 


W— t= a, f,+p,= Pf. (2) 
In a small interval dt, we may write 
ON = (aN — BN?) dt+6Z,—6Z,, (3) 


where, as the first term on the right-hand side of (3) represents the systematic part of the 
random or stochastic change 6N (which can only be 0 or 1 as d¢+0), 6Z, and dZ, are in- 
dependent (modified) Poisson variables with zero means, and variances («,N — /, N*) dt, 
(a,N + £,.N*) dt respectively. To illustrate the use of equation (3), consider the asymptotic 
situation when N ~a/f/; put, more precisely, N = «(1+ u)/f. Then 


du = —a(1+u) udt + P(6Z, —8Z,)/a. (4) 
This non-linear stochastic equation can be approximated for small w by 
du = —audt+dZ, (5) 


where 6Z has variance {(a, +c) P/a+f,—f,} dt = yét, say. In any ‘steady state’ the 
quantity w at time ¢ has the same statistical properties as the quantity w+ du at time t+ dt; 
this gives for the variance o? of u, on squaring and averaging the equation for u+du 
btained fi 5 
ee o2 = o°(1—2adt) +ydt, 


* This is the simplest probability assumption which for large N is equivalent to (1). The last term 
o(dt), denoting a remainder R such that R/dt +> 0 as dt +0, is inserted for strict rigour, but is for con- 
venience omitted in subsequent formulae. 








M. S. BartTLett 29 


whence o? = }y/a. Strictly, as the value N = 0 represents an ‘absorbing state’ (and the 
value N = «,/f, an upper limit which cannot be exceeded), it follows from the theory of 
finite Markov chains that the ultimate state is N = 0, provided this state is accessible from 
any other admissible value of N (i.e. a, and £, positive). However, it seems evident from the 
result above for o? that the chance of extinction, once the value of N is in the neighbourhood 
of the value «/, may be neglected for any given time interval, provided y/« is sufficiently 
small. Under such conditions the population will thus, in contrast with the pure birth 
process, continue to show fluctuations with this variance. Clearly, before persistent 
fluctuations observed in real animal populations are considered incompatible with the 
logistic model, their size and characteristics should be compared with fluctuations pre- 
dicted from stochastic models like the above. It is essential for this comparison, in the reporting 
of laboratory experiments, that individual replications be recorded separately, and also that the 
total size of the population (rather than its density) be noted. 

In the discussion of the logistic curve given by Andrewartha & Birch (1954), oscillatory 
fluctuations in the total population are depicted from experiments on T'ribolium, the weevil 
Calandra oryzae and the cladoceran genus Daphnia magna; and the effect of a more complex 
life history than assumed for the logistic model is, as the authors note, a possible con- 
tributory cause of these. The existence of damped oscillations in a deterministic model with 
an age structure has been noted by Leslie (1948), and the further important point is that in 
a stochastic model such oscillations can maintain themselves even in the steady state, with 
an amplitude depending on the population size. In the simple one age-group case above, the 
fluctuations have no true oscillatory character; in a multi-stage population, oscillatory 
tendencies aise if the roots determining the behaviour of small fluctuations about the 
steady state are complex-valued (such investigation of the nature of fluctuations is illu- 
strated below in the two further problems to be discussed). In the case of bisexual repro- 
duction, the further complication of unequal numbers of the two sexes should also strictly 
be taken into account. 


3. THE CLASSICAL PREY-PREDATOR SYSTEM 


The simplest deterministic Lotka—Volterra equations for a prey-predator system are 
DH = (a,—/,P)H, } 


(6) 
DP = (—a+/,H)P, 


where H denotes the population size of the ‘prey’ (or ‘hosts’), and P the size of the “pre- 
dators’ (or ‘parasites’). When I refer to these equations as classical, no implication is in- 
tended that they represent precisely any real biological systems. They are important in 
representing the simplest theoretical model of prey-predator interaction that can be 
specified, and as a prelude to further discussion their main consequences are recalled. 


bie i dH _ (4~f,P)H 
dP (—a,+/,H)P’ 
whence by integration 


f(A, P)= -—a,logH+~/,H —«,log P+ /,P = constant. (7) 


The curves represented by (7) are closed cycles. The equilibrium point is given by P, = «//;, 
H, = %»/f, but this is neutral, in the sense that there is no damping towards this point if 








30 Competitive and predatory biological systems 


the system is at any other point on the H, P graph. For small cycles around (Hp, #), it is 
easily shown that the path is the ellipse 


a,h? +a, p? = constant, (8) 


where H = H,(1+h), P = P,(1+>p). For larger cycles, the path is restricted by the axes 
H = 0, P = 0, but it is obvious that any stochastic formulation will lead to fluctuations 
which will be especially important near either H = 0 or P = 0, for if either H or P becomes 
zero the oscillatory character of the system disappears. Thus, while modified deterministic 
formulations, either by way of more complicated assumptions in models like (6), or (by 
A.J. Nicholson and V. A. Bailey; see, for example, Bach & Smith, 1941) in terms of discrete 
generations, have led to unstable systems, we see that even (6), when incorporated into a 
stochastic model, is ultimately unstable, for the drift due to stochastic fluctuations will lead 
sooner or later to the total extinction of the predator species, either (i) before the other 
species or (ii) by starvation due to the extinction of the prey first. 

To illustrate this point, a stochastic model was set up compatible with (6). There is the 
same difficulty as with the logistic model that many different birth- and death-rates are 
consistent with the same net rates of increase of H and P, but (corresponding roughly with 
the conditions in some actual situations) «, may be interpreted for simplicity as a pure 
birth-rate for H, and «, as a pure death-rate for P in the absence of H. The relation of the 
second term /, HP in the second equation with the death-rate £, P for H is not so immediate 
as in the case of epidemic theory, where susceptibles when in contact with infecteds turn 
into more infecteds, but will be assumed to be a consequence of an increase £,H in the 
birth-rate for the predators. (The precise assumptions made are given later below.) 

It should be added at this stage that the appropriateness of these simplified assumptions 
are likely to be even less realistic in the case of the type of system now under consideration 
than in, say, epidemic theory, for the basic new assumption of randomness introduced into 
the deterministic equations will often be somewhat dubious (cf. Andrewartha & Birch, 
1954, p. 415). This will be the case when the predator is attracted to the prey, for example, 
by sight or smell; even then, however, the concept of random encounters may become less 
artificial in a natural environment, where the predator may be temporarily outside the range 
of such an attraction. Of course the modifying effect of distance and spatial variability in 
numbers is a further important consideration (cf. Bartlett, 1956), but this will not be con- 
sidered here. 

The constants chosen for the illustrative stochastic model were 


a=1, ~,=01, a,=05, f, = 0-02, 
leading to P, = 10, H, = 25. An artificial realization was started with H = 25, P = 2, and 
developed* by standard ‘Monte Carlo’ technique (cf., for example, Bartlett, 1955, p. 131) 
until extinction of either H or P had occurred. The series so obtained is graphed in Figs. 1 
and 2. At first it showed little stochastic drift, and in spite of low values of H at ‘time’ 14, 
and P at time 8, persisted with similar cycles until the rather sudden switch to a smaller 


* To correspond to /,/f, = 4, the predator population was assumed to increase by one for only one 
in five times that a prey was exterminated. The precise asymptotic transition probabilities assumed were 


thus: H, P to H+1,P Hot 
H,P-1 $P dt 
H-1,P 0-08HP dt 


H-1,P+1 0-02HP dt 





~ FA =m 68 





M. S. BartTLEetTtT 31 


and less well-defined cycle and then to final extinction of P. Qualitatively, this series 
shows many of the characteristics that have been observed in small-scale laboratory 
experiments—variation in amplitude of the cycle, possible extinction first of H (and then P), 
or of P directly (and subsequent unlimited increase of H). 

A quantitative comparison with suitable laboratory data would be useful, but there are 
at present difficulties in any very precise comparisons. The first are theoretical, and arise 


i nimstsg ORS 
1 " ----- Predators 


1007 


No. of individuals 
a 
i 











040 80 120 160 200 240 260 320 360 400 
Time 


Fig. 1. An artificial realization of a stochastic model for the prey-predator relation. 











0 
Predators 
Fig. 2. Prey-predator ‘cycles’ for the artificial stochastic series. The closed 
deterministic cycle is also shown (the dotted curve) for comparison. 








32 Competitive and predatory biological systems 


from the mathematical intractability of a complete solution even of these over-simplified 
models (it is for this reason that the construction of artificial series is particularly informa- 
tive). The second arise from the extra complications present in real animal populations. 
Thus a striking and somewhat unique example of oscillations obtained by Gause (1935), 
in which the prey were yeast cells Saccharomyces exiguus and the predator Paramecium 
aurelia, has been criticized by Andrewartha & Birch (1954, p.440) because, for instance, of 
the tendency in such experiments for sedimentation of the yeast at the bottom of the culture. 
While we should bear in mind such objections, we might notice that the data in this experi- 
ment appear reasonably consistent with the above simple theory. Thus, from the data given 
by Gause, we have a,~1, a,~ 0-45, and since from the graph it appears that P, ~ 100, 
H,~ 1:5 x 10’, we should conclude that £,~ 0-01, £,~3 x 10-%. From the theory of (small) 
oscillations about (H, P,), the period of a cycle, which is approximately 27/,/(~, «4 ), is 
calculated to be 9-4 days, a ‘prediction’ in quite fair agreement with the observed period 
of about 8 days.* Moreover, while the ultimate extinction of the predator on the stochastic 
model has been noted, the chances of extinction after one or two cycles may be relatively 
small (as in the artificial series above), or even microscopically small in certain cases. In 
spite of the absence of complete solutions, it is known in epidemic theory (Bartlett, 1956) 
that such chances of extinction depend very critically on the magnitudes of the coefficients 
occurring in the equations. An approximate device employed in such a context for assessing 
the order of magnitude of the chances of extinction may be used in the present context also. 
It consists in neglecting some of the variation in the numbers of the second species when the 
number of the first species is low and liable to extinction. Thus if in the example under 
discussion we note that H is large and thus on the theoretical model fairly safe for survival 
(any real ‘sedimentation effect’ would support this assumption though a strong dependence 
between individuals of large colonies would not), we need merely consider the chance of 
extinction of P at the bottom of its cycle. At this point H ~ H, and the birth- and death-rates 
for P are equal. If this situation could maintain itself, P would ultimately become extinct 
(see, for example, Bartlett, 1955, p. 71). The fact that, for low P, H is increasing will, however, 
modify such a certainty. If we put roughly for the effective birth-rate a, +csin (27t/T), 
where 7' is the period and ¢ is reckoned from the point of low P, then from known theory 
(Kendall, 1948) the chance of extinction is given by (1+ 1/J)-? ~e-?’”, where 


t 
J -( eda, 
0 


p(x) = -{* sin (27y/T') dy = (4cT'/7) {cos (272/T) — 1}. 
0 
This gives, if b = 37/7, 
SF as oe I. ebeicos (x/b)—1} dar 
0 


This integral may be evaluated as a series of Bessel functions for general ¢, but while it is of 
particular interest to us for ¢ of the order of }7’, it is convenient to remember that the chance 
of extinction of P must be a strictly increasing function of t, and so we shall not under- 
estimate its value if we choose ¢ to be larger than this, say 7’. For this value of ¢ 


J = Ta,e I,(bc), 


* The ‘observed’ period for the mock series (Fig. 1) suggests that the theoretical formula may still 
approximately be applied even for quite large oscillations. 





ti 
fi 


fi 


of 


ce 


ill 





M. 8S. BartLetr 33 


where J,(x) is the Bessel function of zerc order (of the first kind, and of imaginary argument). 
By inspection of the graph given by Gause, c is about 3a,. Hence, with a, = 0-45 and 
T' = 9-4, it is found that J = 2-6, and thus when P ~ 15, the chance of extinction during the 
critical part of the cycle is of order e~*?~0-003.* Thus while it has been suggested by 
Andrewartha & Birch (1954, p. 440) that Paramecium avoid extinction by growing smaller 
when no food is available, these results seem at least reasonably compatible with the simple 
mathematical model. 

The relevance of these extinction probabilities in ecology would be more convincing if 
further data were available where extinction did in fact occur, and calculation gave a high 
chance of extinction in such cases. First of all it may be checked that this chance is high 
for the artificial series of Fig. 1. Using the deterministic solution to calculate the order of 
magnitude of low H or low P after one cycle, with the given initial conditions, the chance of 
extinction per cycle was calculated to be of the order 0-6 for P and 0-3 for H. However, 
a corresponding calculation for the experimental data on mites reported by Gause, Smarag- 
dova & Witt (1936) gave a very small chance of extinction per cycle, in contrast with the 
invariable extinction actually observed. The specific calculation was made for the following 
series (in wheat flour): 


Days 0 6 12 23 27 32 35 38 41 44 47 

Prey 50 24 28 256 408 #496 288 32 — 20 — 
(Aleuroglyphus agilis) 

Predators 5 + 12 12 24 64 96 120 44 24 8 


(Cheyletus eruditus) 


The numbers are estimated total numbers a‘ all stages of growth, excluding eggs; quoted from Gause 
et al. (1936). 


Only very crude estimates («, ~ 0-08, «,~ 0-3, £,~ 0-004, £,~ 0-003) of the coefficients 
were made from these data, as there appears no doubt from these and similar data that the 
model is inappropriate because the true cycle is deterministically unstable. The effect of a 
time lag in the growth of new individuals is one possible explanation of this, and is examined 
further in the next section. 


4. THE EFFECT OF A LAG IN BIRTHS 


It is assumed that there is a lag 7, in the growth to maturity of the prey, so that (in terms of 
adults) the increases may approximately for small 7, be regarded as due to the prey present 
at time 7, previously. A lag 7, is similarly assumed for new predators. This modifies equa- 
tions (6) to 


DH (t) = «a, A(t—7,)-—/,P(t) H(b, (9) 
DP(t) = —a,P(t)+/,H(t—T,2) P(t—T2). 

In terms of the differential operator D, equations (9) can be written 
Dh = a, e-2P(1+h)—a,(1 +h) (1+>p), (10) 
D,, = —%(1+p)+a,e-%2? (1+h) (1+ ), 


* In this calculation the value 15 for P was taken as typical from the graph; if alternatively, as in the 
further examples, it is calculated theoretically from the initial conditions, the value obtained for the 
chance of extinction is even smaller. 


3 Biom, 44 








34 Competitive and predatory biological systems 
where H = H,(1+h), P= P(1+p). For h, p (and 7,, 72) small, equations (10) become 
approximately Dh~ —1,0,Dh—ayp, — 
Dp ~ a,h—%7,D(h+p). 
The character of small oscillations about H,, P, is determined by the equation in D, 
D*(1 +7101) (1+72%) — a4 _(7.D—1) = 0, 


giving rise to instability (for 7, > 0) even on the deterministic formulation. The approximate 
solution for h, p is found to be 


h~ (A cos 6t + Bsin Ot) e%1%2"2!, 
ap ~ (AO sin 6t — BO cos Ot) (1 + 7,0,) e8%1%272! — fa, aah, 
where 6? ~ «1a (1—7,a,)(1—7,a,). Thus 
ty p?(1 — 7A, +T.Mq) + ty Tohp + anh? ~C e%%2"2", (12) 


where C =«,(A?+ B?) is a constant. 







600} 
Calculated ante paz 
% Parasites — 
= 3 First observed {Hosts = --~ 
5400 -- > series Parasites ----- 
z *\Second observed| Hosts — see 
S 3 series Parasites «-- 
z 3 
=_- 3° 
Pn 4 
~ H 
3 : 
2200 
— J 








T 
20 40 60 80 
Time (days) 
Fig. 3. Numbers of ‘hosts’ and ‘parasites’ in a deterministic model with age-lag. 
Two observed series taken from Gause et al. (1936) are shown for comparison. 


The chance of extinction would obviously be high if the deterministic number H at the 
bottom of the cycle is as low as, say, 4. A calculation based on equation (12) (while rather 
rough, because this equation does not exactly apply except for small h, p) indicated that 
T, need not be more than about 5 days, suggesting that the above simple modification might 
be sufficient to bring the assumed model more in line with the data under discussion. 

A numerical check on this suggestion was considered advisable, especially as the new 
model implies a modified growth rate for the hosts (in the absence of the predators) of 
a,~a,/(1+7,«,). A completely new set of values of the coefficients was estimated, still 
rather roughly, as 7, = Tz = 4, a, ~ 0-25, a,~ 0-15, 2, ~ 0-015, 8, ~ 0-002. The deterministic 
series calculated for these values are compared in Fig. 3 with the corresponding observed 
values quoted above. Some apparent discrepancies remain, but it should be noticed that 
a second observed series with the same initial conditions as the first (also shown in Fig. 3) 
shows considerable divergence from it. Thus any more detailed analysis, while no doubt 








in 


fi 





te 


12) 


the 
ther 
that 
ight 


new 
s) of 

still 
istic 
rved 
that 
ig. 3) 
loubt 





M. 8. BartLettT 35 


requiring a rather more elaborate study of age structure, would have to allow for such 
stochastic fluctuations. The immediately relevant point, however, is that an effectively 
zero value of H at the end of the cycle follows from this slight modification of the model. 


5. THE EFFECT OF IMMIGRATION 


The stochastic instability of even the classical prey-predator model (6) naturally suggests 
that the theoretical effect of immigration should be considered (cf. Bartlett, 1956). In simple 
epidemic models of the mixed susceptible and infected population (with infection by con- 
tagion) it is well known that epidemics cannot recur unless the susceptibles are renewed. 
A more difficult point is to what extent renewal of infecteds from outside is necessary, but 
recurrence is in any case made certain by such renewals. In the prey-predator system, the 
immigration of either prey or predator will prevent absorption at the corresponding axis 
H = Oor P = 0, but absorption at the other is still possible. If, however, damping terms are 
introduced into the deterministic model by such immigration, extinction may be greatly 
delayed. Finally, if immigration of both prey and predators is allowed, it is obvious that 
extinction is completely prevented. 

It was consequently interesting to find that Gause (1934) had introduced immigration 
experimentally into his laboratory cultures, to avoid the extinction difficulty. His theo- 
retical reasons seemed, however, rather ad hoc and incomplete, for we have seen that with 
small laboratory populations extinction would not, even in the absence of further com- 
plications, be necessarily unexpected on an explicit stochastic formulation. 

With immigration, equations (6) are modified to 


DH = (a,—/,P)H+4&, 3) 
DP = (-—a,+/,H)P+€,. 
The deterministic equilibrium point is now given by (H), Py), where 
By %_P§— Po(fy 62+ Bly + %%q) + 42 = 0. (14) 


There is one relevant positive root, as 


(By €y + Bae, + %%q)* > 40; yf Ep; 
for small €,, €9, 


a, Ps hy xf, 
Ry~z+s— Ay 


By Bom 
For small oscillations h, p, where H = H,(1+h), P = P(1+>p), 


Dh~ —he,P2/%— fp, i 


(15) 
Dp ~ — pez f,/%,+ Bz Hh, 
and the equation in D, 
D* + D(€, Bo/%_+ €2h1/%,) + Pi P2 Hy Py~ 9, 


implies damping provided ¢€, and/or é, > 0. 
The deterministic damping operates at all amplitudes, at least for small ¢, ¢’. For if the 
function (cf. equation (7)) F 


F(H, P) ei /,(P — Pylog P) + £,(H — H, log H), 











36 Competitive and predatory biological systems 


is differentiated, it is found that 
— —{¢Po —a.)2 eh; = 2\ 2 2 
DF = la, H P24 Xy) + pA 4) j + (41, €2), 
so that F always decreases in time until it reaches its minimum value in the neighbourhood 
of (%/P2, %/f,), at the point (Hy, Fy). 

In view of the instability found in § 4 due to the lag from birth to reproductive age, it is 
advisable to combine the two effects of such lags and immigration. Since the effects are 
additive to the first order (for oscillations of small amplitude), it is readily seen that deter- 
ministic damping will still be maintained provided 


€, Hy + €yPy > XAT. (16) 
As deterministic damping for small ¢,, ¢, may be only slight, the amplitude of oscillations 


may evidently be considerable even when a stochastic ‘steady state’ (or true stationarity) 
is feasible. 


6. COMPETITION BETWEEN TWO SPECIES 
In the generalization of equation (1) to two species, the simplest model is represented by 
the two equations DN, = (0, —BN,— Bia) “3! 
DN, = (%2— BoM, — Boa No) No, 
where N, and N, refer to the two species. A further simplification is obtained if it is assumed 
that the restrictive action of the populations is sufficiently well represented by a combined 


population size N=N,+AN,, so that /,./P1, = fe2/ho, = A (see, for example, Kostitzin, 
1939, p. 124). Equations (17) then provide equations* 


DN, = (a,—/,N) *y 
DN, = (%2.—f2N) No, 
where /, = /,, AP. = fo, whence 


D(fz log N, — £, log Ny) = 2 f.—%/, 
Nfs] Nf = (nfs)nfs) eaPa-eaho, 
where n, and n, are the initial values of N, and N,. This result implies that N, or N, tends to 
zero according as «, f, is less than or greater than «,/,, and the remaining species obeys the 
single logistic equation (1). This deterministic result is unlikely to be affected by stochastic 
fluctuations unless a, /,~ «,/,, but in this case they may sometimes be important. In par- 
ticular, suppose the two species refer strictly to two different genealogical lines of the same 
species, so that all the corresponding coefficients in equations (1) are equal. Then 
D(N,—N2) = 0 when N, = N, initially, and with no deterministic damping in N,—,, 
stochastic drift will take place until either N, or N, is zero. This point is noted because it has 
some relevance to the extinction phenomenon found experimentally in Park’s experiments 
with T'riboliwm, although a more appropriate model will be assumed for this case. 
If £12/P 11 + Bo9/Po1, the equations (17) give a possible equilibrium point (with N,, N,> 0) 


(17) 


(18) 


~ Buba—Biba’ ~* Pubes—Preha’ 


* It should be noted that (in contradiction of a claim by Andrewartha & Birch (1954, p. 411)), 
there is no logital inconsistency in DN, or DN, being negative. The terms on the right of the equations 
are a balance of births and deaths, and will give a negative total if the deaths predominate. 





~~ wee 2m Dll ~~, i > . Se 


nn -€_ mm fF 





M. 8. BARTLETT 37 


which is found to be stable if 


MP og>AePy2,  % fo <%Py1, 
and unstable if Oy Boe< efi, % Bo, > 2/4). 


The last case is interesting, as it implies on the deterministic model that which species 
survives depends on the initial conditions; hence in a complete stochastic model random 
variation at the beginning of population growth might be a crucial factor in the survival 
of one or other species.* 

The Tribolium model assumed follows the lines suggested by Neyman et al. (1956), 
where the competition between the two species, 7’. castaneum and T’. confusum, is inter- 
preted more as a predator-prey interaction than a direct competition for food. The four 
stages of growth are for simplicity condensed into two, referred to as ‘passive’ and ‘active’ 
(the adult stage), and the control of numbers arises apparently from a cannibalistic pro- 
pensity for the adult to consume the young or ‘passive’. This cannibalism appears to apply 
equally to the devouring of the young of either species, though the adult propensities may 
differ for the two species. 

The Neyman, Park and Scott model was formulated in terms of discrete generations. The 
general type of stochastic formulation adopted in the present paper can include such dis- 
crete generation models (cf. § 4), but the essential points seem to emerge from the simplest 
stochastic formulation in continuous time of the type represented by equation (3), and 
this will be used here. The single species case is of some interest for comparison with the 
straightforward logistic model, and is considered first. Spatial ‘diffusion’ of the beetle is 
ignored, though it may be an important further factor in the actual experiments. 

The deterministic form of the model assumed is 


DP = —-pAP—-vP+AA, 
DA = vP—-¢A, 


where P is the number of passive, A of active, beetles, and A and ¢€ ( < A) are birth- and death- 
rates, v is the transition rate from passive to active, and y is a ‘voracity’ coefficient. The 


‘libri int | 
. leatitatidaamatnd P, = (A-6)|m, Ag = v(A—€)/(nHe). (20) 
For small oscillations, put P = P,(1+p), A = Ap(1+a). Then 


(19) 


Dp = —Avp/e + va—v(A—e) aple 
~ —Avple+ va, 


Da = e(p—a); 
and the equation in D, 
D? + D(e+Av/e)+(A—e) v = 0, 


indicates damped oscillations. Note that for the function 


f(p,a) = p—log (1+ p) + v[a —log (1 +a)]/e, 
wy. Sets va(ap + 2p —a) 





e(1+p) l+a 
vp(1+a) | va(ap + 2p —a) v(a — p)? 
< — eS 5 oo -  , 
i+p ita (I+p) (1+a) 


* The possibility of such an effect occurring with T’ribolium was raised by Dr P. H. Leslie in the 
discussion following a seminar I gave at Oxford on 15 May 1956. 








38 Competitive and predatory biological systems 


so that f decreases in time, implying that deterministic damping operates at large as well 
as small amplitudes. 
The exact stochastic equations assumed in place of (19) are (cf. equation (3)) 


OP = (—vP+AA—pAP) dt-8Z, +82Z,—82Z;,) 


(21) 
6A = (vP—c¢A) 6t+8Z,—6Z,, J 


where the variances of 6Z,, 6Z,, 6Z, and 6Z, are indicated by the respective deterministic 
terms vP dt, AA dt, ete. With P = P,(1+p), A = Ap(1+a) as before, and in the case of small 
fluctuations, 

dp ~ (—Avp/e+ eae a (22) 

da ~ e(p—a) dt + (8Z, —8Z4)/Ayg. 
While presumably on the above model extinction of the population will occur after a long 
enough time,* this may (for a deterministic ‘ceiling’ population not too small, but fluctua- 
tions relatively small) be so long delayed as to be negligible and an effective or quasi- 
stationarity be established. Under such conditions, equations (22) yield, on squaring or 
multiplying together the expressions for p+ dp, a+da, and averaging, 

0~ —2Ava?/e + 2v cov (p,a)+(AAg+vPy + wAyPy)/P5, 


0 ~ 2€ cov (p, a) — 2e0? + (vP, +€A,)/A?, 
O~ — »” cow (p, a) + vo? + €0%, —€ cov (p, a) — vFy/(Py Ao), 


A(Av +? + ev) e?(A —e€) + A*v) 





we. kilt wat adi. At cl 
whence oT P(A—€) (Av +e)’ oe, A,(A—e) (Av +e2)’ om 
Ae? 
cov (P,4)~ Be) (a + et)” | 


When the two T'ribolium species are put together, they are denoted by suffices 1 and 2, 
and the following (deterministic) equations assumed. 


DP, = galt alii tk 
DP, = — fla Py Ag — fly Po Ay — Ve Py +AgAz, | 
DA, = V2P,—€,Ayg. 


(24) 


An attempt to find a single equilibrium point yields the two (in general) incompatible 
equations d 
_ hh" p,—-"2"p_ y+ 2 0, 
Ey €) 
d r (25) 
y Lo V. | 
—"1"1 p,—"2" p,_y, +2"? = 0,| 
"2 "2 
these representing parallel lines in the P,, P, plane (cf. the discussion in Neyman et al. 1956). 


There is strictly no stochastic upper limit to the population in contrast with the simpler model 
umed in § 2, though the effective limit implied by the non-linear ‘voracity’ term suggests that the 
ultimate extinction property should still hold. 





ell 


21) 


tic 
all 


22) 


ng 
ua- 
1si- 
or 


23) 


| 2, 


24) 


ble 





M. 8. BartLerr 39 


Before attempting to consider the problem of stability in relation to the zone thus 
defined, it is instructive to consider the case when the two species become identical in 
behaviour. In this case, if 


Pit+P=X, A,+A4,=Y, 
P,-P,=U, A,-—A,=V, 


DX =—-pXY—-vX+AY, 
} (26) 
DY = vX -eY, 
where A, = A, = J, etc., and 
DU =-pnUY-vU +A, 
} (27) 
DV = vU-eVD. 
60 
50 


Second species 
Ww 
S 
T 


y 
o 
T 











Bre 
uw 
oO 
Ss 


0 10 20 30 
First species 


Fig. 4. Graph of total numbers of two competing Trribolium species in a simplified 
stochastic model—symmetric case (the dotted line is the ‘equilibrium’ line). 


The first pair of equations in X and Y are (as they must be) identical with (19), and refer 
merely to the total population. As Y is damped, it may approximately, even in the stoch- 
astic model if numbers are not too small, be replaced by its equilibrium value v(A —e)/(ye). 
Then equations (27) become 


bu = -“u+ay, | 


DV = vU-eV, 


with any equilibrium solution U,/V, = ¢/v. Hence in the stochastic model there will be random 
‘drift’ along the line U = eV/v until extinction of one or other of the two species occurs. 
This is not altogether unexpected, for these symbols now refer to individual genealogical 
lines of the same species. The case of two similar species should not be too different from this 
limiting symmetrical case, for which the chance of extinction of either species first, for 
symmetrical initial conditions, must be 3. However, if in equations (24) the equilibrium 
ratios A,/P, = v,/€,, A,/P, = v2/€, are inserted, the incompatible equilibrium equations 


(25) are replaced by 
Diog P, = —f, P,—f.P,+%,) 


(28) 
Dlog Py = —8,P,—faP,+%,J 








40 Competitive and predatory biological systems 

where a, = A,V,/€,—¥,, By = 4,V,/€,, ete. a particular case of the competition equations 
17). From (28), 

ai ahaa Dlog (P,|P,) = 0 — ay 

and P,/P, tends to increase or decrease according as a, is greater or less than a. Hence 


there will be a tendency, as the differences in the characteristics of the two species increase, 


60 


Second species 
Ww 
oO 
T 


s 
oO 
T 

‘ 











0 10 2 30 40 50 60 
First species 


Fig. 5. Graph of total numbers of two competing Tribolium spevies in a simplified stochastic 
model—asymmetric case (the dotted line joins the equilibrium points on the two axes). 






—— First species 
---- Second species 


w 
oO 


20 


No. of individuals 








L 
0 1:0 20 3-0 40 5:0 60 7-0 8-0 
Unit time 


Fig. 6. Total numbers of individuals of the two competing species plotted 
against time (arbitrary units) in the asymmetric stochastic model. 


for the superiority of one species to become more marked and lead more frequently to the 
survival of that species. 

A preliminary illustration of these last conclusions is depicted in Figs. 4 and 5. In the 
first a Monte Carlo realization is shown (for the total numbers of each species) for the sym- 
metrical case, with the rather arbitrary values A = 2, ~ = 0-05, v = 1, € = 1, beginning with 





ions 





M. S. BarrLetrr 41 


A, = A, = 10. The ‘equilibrium’ line along which stochastic drift occurs is shown dotted. 
(The corresponding time realization has not been completed, but an idea of the relative 
time elapsing before extinction of one species occurred is obtained from the number of 
steps represented in the graphs, viz. 1224 for Fig. 4 and 652 for Fig. 5.) In the case shown in 
Fig. 5, A, was altered to 3, and e, to 2 (with A, still 2,¢, = 1), and the bias towards extinction 
of species 1 thus created («,>a,) was supported by the corresponding (and more rapid) 
extinction in the realization. In this case the time realization is also shown, in Fig. 6. 

No quantitative comparison with actual T'riboliwm data has been envisaged at this 
stage; this will require further investigation. The main object in this preliminary discussion 
has been to demonstrate qualitatively some of the observational features. Even with 
the present simple T'’riboliwm model (adapted from that formulated by Neyman, Park and 
Scott) the times to extinction need considering further. It has been suggested above that 
the duration of any experiment is short compared with the extinction time of a single 
species, in contrast with the extinction time of one or other of the two competing species, 
but obviously more definite orders of magnitude for these extinction times would be useful. 


I am very grateful to David G. Kendall for some critical comments on my first draft of 
this paper. I am also indebted to Mrs L. Linnert and Miss C. Caley for assistance with the 
construction of the artificia] realizations shown in the figures. 


REFERENCES* 


ANDREWARTHA, H. G. & Brrou, L. C. (1954). The Distribution and Abundance of Animals. 
Chicago. 

Bacu, P. pE & Smits, H. 8. (1941). Are population oscillations inherent in the host-parasite relation? 
Ecology, 22, 363-9. 

Barry, N. T. J. (1950). A simple stochastic epidemic. Biometrika, 37, 193-202. 

Barttett, M. 8S. (1955). An Introduction to Stochastic Processes. Cambridge University Press. 

Bartiett, M. S. (1956). Deterministic and stochastic models for recurrent epidemics. Proc. Third 
Berkeley Symposium on Mathematical Statistics and Probability, 4, 81-109. University of California 
Press. 

Bartiett, M. S. (1957). Measles periodicity and community size. J. R. Statist. Soc. A, 120 (in the 
Press). 

Eton, C. 8. (1955). Natural control of animal populations. Review of Andrewartha & Birch (1954), 
Nature, Lond., 176, 619. 

Feiier, W. (1939). Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in wahr- 
scheinlichkeitstheoretischer Behandlung. Acta biotheor., Leiden, 5, 11-40. 

FELLER, W. (1940). On the logistic law of growth and its empirical verifications in biology. Acta 
biotheor., Leiden, 5, 51-66. 

Gaus, G. F. (1934). The Struggle for Existence. Baltimore. 

Gausg, G. F. (1935). Experimental demonstration of Volterra’s periodic oscillations in the numbers 
of animals. J. Exp. Biol. 12, 44-8. 

GausgE, G. F., Smaracpova, N. P. & Wirt, A. A. (1936). Further studies of interaction between 
predators and prey. J. Anim. Ecol. 5, 1-18. 

KEMPTHORNE, O. et al. (Editors) (1954). Statistics and Mathematics in Biology. Ames, Iowa: Iowa 
State College Press. 

KENDALL, D. G. (1948). On the generalized ‘birth-and-death’ process. Ann. Math. Statist. 19, 
1-15. 

KENDALL, D. G. (1949). Stochastic processes and population growth. J. R. Statist. Soc. B, 11, 
230-64. 

Lestiz, P. H. (1948). Some further notes on the use of matrices in population mathematics 
Biometrik+. 35, 213-45. 


* Including some papers not mentioned in the text. 








42 Competitive and predatory biological systems 


Lestiz, P. H. (1957). An analysis of the data for some experiments carried out by Gause with 
populations of the Protozoa Paramecium aurelia and Paramecium caudata. Biometrika, 44, 
(to appear). 

Lotxka, A. J. (1925). Elements of Physical Biology. Baltimore. 

McKeEnprick, A. G. (1926). Applications of mathematics to medical problems. Proc. Edinb. Math. 
Soc. 44, 98-130. 

NeymMan, J., Park, T, & Scott, E. L. (1956). Struggle for existence. The Tribolium model: biological 
and statistical aspects. Proc. Third Berkeley Symposium on Mathematical Statistics and Prob- 
ability, 4, 41-79. University of California Press. 

Prarr, D. M. (1943). Analysis of population development in Daphnia at different temperatures. 
Biol. Bull., Woods Hole, 85, 116-40. 

SkeiiaM, J. G. (1955). The mathematical approach to population dynamics. The Numbers of Man 
and Animals (ed. by Cragg, J. B. & Pirie, N. W.) 31-46. Edinburgh: Oliver and Boyd. 

VorverRA, V. (1926). Variazioni e fluttuazioni del numero d’individui in specie animali conviventi. 
Mem. Acad. Linceit Roma, 2, 31-113. 








ith 








[| 43 ] 


THE CONSISTENCY AND ADEQUACY OF THE POISSON-MARKOFF 
MODEL FOR DENSITY FLUCTUATIONS 


By V. T. PATIL 


University of Manchester 


SUMMARY, The Poisson—Markoff or emigration-immigration process (as defined in §1-2) has been 
used as an approximate model for the number of independently moving particles present in particular 
regions of space, for example, as an approximate model in spermatozoa studies. The postulation of 
such a model, valid for all modes of division of the space, is shown to be inconsistent in the sense of 
leading to contradictions, except when 7, the total interval of time considered, is small; the assump- 
tions (cf. §1-3) about the infinitesimal transition probabilities then ensure the consistency of the 
model to O(7). To study the adequacy of the Poisson—Markoff model for some data of spermatozoa 
counts, the asymptotic y?-goodness of fit test based on serial correlations is employed. The goodness 
of fit tests indicate a significant departure of the spermatozoa data from the Poisson—Markoff model, 
the departure being not so striking in the uniregional case as in the multiregional case. 


Part I 


1-1. Consider a system of particles performing some type of stochastic motion in an 
‘infinite’ space. Let R,, R,,...,R, (any k finite non-overlapping regions in the space) 
and the complementary set R* represent a mode of division of the space, and let 
N,, (t>0) be the number of particles in the region R, at time ¢t, N, being the column 
vector with N,, (r= 1,2,...,k) as its elements. Then the full specification of the 
stochastic motion is sufficient to specify the process {N,}. However, not only is the 
converse not true, but also there may not always be a type of stochastic motion leading 
to the specified process {N,}. In § 1-5, it will be shown that the postulation of the Poisson— 
Markoff or emigration-immigration process for N, (ef. § 1-2), when assumed to be valid 
for all modes of division of the space (as has been used in connexion with spermatozoa 
studies) leads to contradictions and hence is not compatible with any type of stochastic 
motion. 


1-2. The Poisson—Markoff or emigration-immigration process. For setting up this process, 
we start with the following assumptions: 


(1) The particles are moving independently of each other. 


(2) The probability that precisely one of the particles in the region R, at time ¢ moves 
into R, in time dt is N, ,A,,dt + o(dt). 


(3) The probability that precisely one of the particles in R, at time t moves into R* in 
time dt is N, ,A*dt + o(dt). 

(4) The probability that precisely one of the particles in R* at time ¢ moves into R,in 
time dt is ,dt + o(dt). 

The process {N,} is then Markovian (of the first order) and the distribution of N, at a single 
instant is, as pointed out below, that of k independent Poisson variates. This Poisson— 
Markoff process is also called the emigration-immigration process (Bartlett, 1949). The 
parameters A,,, A* and y, will remain unspecified for the moment. The detailed behaviour 
of the Poisson—Markoff process is discussed below. 








44 Poisson—Markoff model for density fluctuations 
Let (i) the ‘stochastic interaction-rate’ matrix (cf. Ruben & Rothschild, 1955) 


rAv+ ¥ Ay —Ajpz —Ay .«.. ~h 7 
r+1 
Ag AZ+ VA —Aggs -- — Ady 
r+2 (1) 
—Aw —Ax2 —Ayg -) AE+ DAw 
L r+k J 








be denoted by A; 
(ii) K,,...,K, be the latent rootst of the matrix A and K the diagonal matrix with 
Ky, ...,K, a8 elements; and 
(iii) the latent row vector matrix of A be denoted by @. 
Then in statistical equilibrium the probability generating function of N,,, and N, is 
given by 
Thy, t47(Z, 47) =exp {m'(z,— 1)+m'(z,,,— 1) +m’(Z,— 1) P(z) (z,,,— VD}, (2) 


where the meaning of the symbols in (2) is as follows. The column vector m is defined by 
p’ = m’A, p being the column vector with 1, ..., 4, as elements; (z7—1) (7' = t,t+7) is 
the vector with z7,,—1 (r = 1, 2,...,&) as elements, zp, being the auxiliary variable corre- 
sponding to N, ,; Z,—I is the diagonal matrix with the elements z,,—-1 (r = 1,..., k); and 
P(r)=0"e* 90. 

The proof of result (2), on the same lines as in Bartlett (1949), is given in Appendix 1. 
The marginal distribution of N, is, from (2), 


II ,(z,) =exp {m'(z,— 1)}, (3) 


that is, that of k independent Poisson variates. As the process {N,} is Markovian, the two- 
point probability distribution (2) of N, is sufficient to specify the process fully. 


1:3. The kinematic hypothesis. Nothing has been assumed about the form of the A’s and 
the y’s as yet. The following assumptions about them are now made, the assumptions being 
in agreement with the collision frequency considerations in the kinetic theory of gases. 
The proof of their agreement (cf. Rothschild, 1953a) is given in Appendix 2. 

Let A, be the area of the region R,, B,, be the length of the part of its boundary 
which it shares with R, (where r+s) and L, the ‘free’ boundary of R, (which it shares 
with R*). For some values of r, L, may be zero; } B,,+ L, will always be the perimeter 





s+r 
of R,. 
The kinematic hypothesis then asserts that 
. 2 “=e ” La 
(i) A, = Po (ii) AF = CA and (iii) “4, = mp p,L,, (4) 


r r 


where the p, are to be so chosen that the conditions m,A’A =p’ is satisfied, ¢ being 
the mean speed of the particles, m, the mean number of particles per unit area and 
A’ the vector (A,,...,A;,). It is easy to verify that the p,’s thus chosen are all equal 
to ¢/7. 


{ It is assumed that corresponding to a rple latent root, there exist r linearly independent latent 
vectors, whatever matrix A is. This is true in the particular case considered in §1-5, as there A is a 
symmetric matrix. 





ith 


is 





V. T. Pati 45 


The number of parameters are now reduced to two, namely, ¢ and my. Thus the combina- 
tion of the Poisson—Markoff process with the kinematic hypothesis leads to a two-parameter 
stochastic process, which has in fact been used in the study of the density fluctuations of 
spermatozoa and which we wish to examine here. (¢ and m, may be any two parameters 
and not necessarily the mean speed and the mean density of the particles, A being replaced 
by some other relevant vector and the discussion below still holds.) 


1-4. There are two aspects of this Poisson—Markoff process to be investigated. First, is 
the Poisson—Markoff process when supposed to be valid for all modes of division of the space, 
consistent in the sense of being free from contradictions? If it is not, then evidently no type 
of stochastic motion can lead to it. Secondly, does the Poisson—Markoff process, irrespective 
of its theoretical appropriateness, give a ‘good fit’ to the data when the method of dividing 
up the space has been agreed upon beforehand and fixed? The second aspect is dealt with 
in relation to some data of spermatozoa counts in Part II of this paper. Regarding the first 
part, it is shown below that the postulation of the Poisson—Markoff process valid for all 
modes of division of the space is not consistent. As = simple case, consider the Poisson— 
Markoff model set up for N, in three contiguous regions R,, R, and R,. By averaging over 
the region R,, the ‘marginal’ process concerning R, and R, alone can be derived. Also just 
these two regions could have been considered at first and the Poisson—Markoff process for 
N, in these two regions could have been set up. Then the ‘consistency’ principle would 
require that the two ways of finding the joint distribution of N, , and N,, should lead to the 
same answer. The consistency principle could be generalized to all the possible arrays of 
regions and selections from them. Here we consider the Poisson—Markoff process for N, in 
a two-dimensional network of congruent rectangles, to be commonly met with in practice, 
and illustrate how the consistency principle breaks down. 


1-5. We set up the Poisson—Markoff process with the kinematic hypothesis for N, in 
this two-dimensional network of k’ rows and kcolumns as shown in the accompanying figure 
and then consider the marginal process {N;} in one of the k x k’ regions of the network: 


itn columns k 


|| a 


rows | __ 











We first specify the stochastic interaction-rate matrix A and the correlation matrix P(r) 
for this network of k x k’ regions. For this case, 
(1) A, = A, say, for all r. 
| B, when R, and R, are contiguous regions in the same row, 
= 1 Bp’, when R, and R, are in the same column, 
0, otherwise. 


(2) B 


rs 


1 
Bilt = B,,) = p, for every region. 
r+s 


Obviously p = 2(1+ p’). 


(3) 





46 Poisson—Markoff model for density fluctuations 
Let Bc/A7 be denoted by A. Then 


A, —p'Al, (0) (0) “ee eee (0) 7 
— pal, A, —p' Al, (0) “56 es (0) 
0 — pal, A, —p'aAl, ... aes (0) 
A=A,, = 0 0 — pal, A, Sx ei (0) |, (5) 
: : : : : : —p'al, 
the... 0 0 0 .. —p’Al, A J 








where A,, is the k x k matrix 


gs»... 0 Be gis ae 07 
—1 p —-l Dornan) /Tone 0 

0 -l elles eet ae 0 

Al 0 0 -1 p 0 
: a 

| 0 0 0 0 0 -1l p- 








and I, is the unit k x k matrix. 
The matrix A,,, may also be expressed as 

Nyy = A(x Ty + U(x) Sp, 
where the symbol (x ) denotes the outer product, I,, the unit k’ x k’ matrix and J, is the 

oo O at: Deyo ct 
—1 0 -l 0 
0 -l 0 -l 
Jv=paAl 0 0 -1 O 


—err CO CO OS 


0 0 0 0 0 -l 
For obtaining the latent roots and latent vectors of A,,,, we observe that 
(1) the latent roots of A, are 


kK, = pA—2cosw, (r=1,2,...,k), 
where w, = rm/(k+ 1); 
(2) its latent row vector n}. corresponding to x, is 


n= lea) (sin kw,, sin (k—1),, ..., sinw,) (Montroll, 1947); 


(3) the latent roots of J, are 
L. 


6 - U U ¢ es 
~»=—2Ap’cosw, (r’ =1,2,...,k), 
1 


where w/, = ——-; 
rh Co +1 








5) 


16 





V. T. Pari 47 


(4) the latent vectors of J, are the same as those of A... We denote the latent root matrices 
of A, and J, by K,, and L,, respectively and their latent vector matrices by ©, and 0, 
respectively. Then the latent root matrix and the latent vector matrix of A,,,, are respec- 


tively given by K,. = K,(x)L-+1,( x ae 


Ox = 9; x) Oy. 
For Ox My Opn = O,( x ) O,[AL( x ) Ly + 1,( x ) I] O;.( x ) 0; 
= [0,A,,0;,] ( x ) (0, 1,,9;-] + (0, 1, 9;] ( x ) (0, J, 07] 
= K,(x)I,+1,(x)Ly, 


(6) 


on our observing that 07, = 0;.( x )@; and that as A,, J, are symmetric matrices @, and 
0,, may be taken to be orthogonal matrices. 
As the regions are equal in area, the m’s are equal and the matrix 


P(r) =0-e-*" 6 =0'c*'O, 


© being an orthogonal matrix, reduces to the correlation matrix between N, and N,,.,. 
Making use of (6) it can be easily shown that (i) the auto-correlation coefficient for the region 
in the /th column and /’th row is 


Pufdi. 
= [sin? lw,] e2A7 cos +| 
1 ns 


k’ 
Fri > [sin? Vo] e2p’Ar cos «| ; (7) 
r=1 


and (ii) the cross-correlation. coefficient pertaining to the region in the /,th column and the 
ljth row and the region in the /,th column and the /3th row is 


2 i | : 
Pry :(7) = e-?4** x ——~{ © [sin 1, ,] [sin 1, @,] e247 65 or 
1lj, lala k+ 1 tJ { 


2 oe te , , , 
x—=—| D [sin], o,] [sin 1, w)] e247 08% } (8) 
k +1 r’=1 


The two-point probability distribution (2) for N, in the network of kx k’ regions is thus 
completely specified, and thus the process {N,}, being Markovian, may be fully specified. 
From this multivariate process, the process {N;} pertaining to the region in the /th column 
and l’th row may be obtained as a marginal process. In particular, the two-point probability 
distribution corresponding to this process is 


TT 14-7(%s 2147) =exp {m(z,— 1) + mz, — 1) + mMPp (7) (%— 1) (4, — 1}. (9) 


The consistency principle requires that the distribution (9) should be invariant to changes 
in 1, I’, k and k’. An examination of the expression (7) for Py (7) shows that this is not the 
case. Similarly, the consideration of other more complicated situations would be expected 
to lead to the same conclusion. 
Formula (7) and hence the result (9) can easily be verified for simpler cases, say, for 
(i) k= k’ =1, l=VU =1 (ie. the Poisson—Markoff model for region R, alone is 
considered), 
and (ii) k=2andk’=1,1 =’ = 1 (i.e. the Poisson model for region R, is obtained as a 
marginal model from the Poisson—Markoff model for regions R, and R,). 








48 Poisson—Markoff model for density fluctuations 


Then P(r) = e-?*", in case (i), (7a) 
= e-PAteosh At, in case (ii). (7b) 
Since A+ 0, e—PAT + e—PAT cosh AT. 


Hence the inconsistency of the Poisson—Markoff model is evident. 

It follows, therefore, that there does not exist any stochastic motion that leads to the 
Poisson—Markoff process for N,, valid for all modes of division of the space. However, the 
kinematic hypothesis ensures, that for small 7, the Poisson—Markoff process is consistent 
to O(r). Also in certain situations at least, the Poisson—Markoff process might be an adequate 
approximation of a consistent process for N, governing any particular data of density 
fluctuations. This contention involves the question of goodness of fit and is examined in 
Part II of this paper, in relation to some actual experimental data. 


Part IT 


2-1. Goodness of fit tests. The multivariate Poisson—Markoff model for N,, for large 
mean number density, degenerates into a normal Markoff model. Hence it can be repre- 
sented asymptotically as a multivariate autoregressive model of the first order and the 
asymptotic x?-goodness of fit test based on serial correlations (the G-test or the H-test) 
(Bartlett & Rajalakshman, 1953; Quenouille, 1947) can then be applied. 

Thus denoting the conditional expectation of N, for given Nyat time fy, t >t) by Z(N,| No), 
we have from (2), § (1-2), 

E(N,—m | No) = P(¢—t)’ (Ny—m), (1) 
a linear regression formula. Also 


B{(N,—m) (N;,,—m)'] = MP(7), (2) 


M being the diagonal matrix (7,, mg, ...,m,). Therefore, the model may be specified by 
the stochastic equations 

(N,—m) — P(t—t))' (No—m) = €, (3) 
where the probability distribution of ¢, , may be written down from that of the process {N}}. 
In particular, when ¢, = t—7,7 being fixed and t takes the values a+rz7(r = 0, +, + 2,...) 

Ble) =(0) and Biee;) = (0) (+t). 

As m,—> 00, the Poisson—Markoff model tends to a normal Markoff model and the variates 
e, tend to be normal and to be independently distributed of ¢€, (t+t’). In this case, the 
asymptotic H-test or the G-test can be used to assess the goodness of fit of the model (3) 
to any data consisting of a large number of observations. For finite mo, €, and €, (¢+t’) are 
not distributed independently, as an examination of the joint distribution of €, and e, 
would show. The effect of this dependence on the G-test or the H-test can, however, be 
assessed. This is done in § 2-2. In § 2-3 is given the summary of the asymptotic y?-goodness 
of fit test applied to bull spermatozoa data. 


2-2. The effect of the dependence of the residuals e€, for finite m, on the G-test. We consider 
the one-variate case first. From equation (3) we have 


(N,--m)—p(N_1—™) = &, 


the interval of time between consecutive observations being taken as the unit time and 
p=P(1) = e~, where the probability of one particle leaving R in time dt is Adt + o(dt). 








a) 


b) 


er 


id 








V. T. Patm 49 


This can be written as HX, = &, (4) 


where H,= 1—pE;', E;* being the displacement operator changing X, to X,_,, and 
X,=N,—™m. Then since E(X,,,,X,) = p!“!m, it follows that 


E(Es, X;) =0 (u > 0) and EE uetw) Ve m1 —p*) d(u - u’), (5) 
where d(u-—u')=0, (u=w); =1, (u=w’). 
n—t 
Let D,= HC, where C, = — X,X,,,. Then 
—~*’u=1 
} “2% 
D~ ial 2 XuSw 


l n—t 

B(D)~——HI"S Xyeuu| =0 (t>0) 
m—t \u=1 J (6) 

Further it can be shown that 


Se 





(t>7). (7) 
Hence from the result (7) 
E(H,D,H, D,) = E{(D,—pD,4)(D,—pD,-1)] 
= E(D,D,)— pE(D,D,_4)— pE(D_1D,) + PE(DD, 1) 





0 (t + 7) 
~ yn F st OE gS 
—, ee eo (¢ = 7). (8) 
If r,=C,/C,, then 
0 (t+7), 
E(H}r,H?r,)~} (1—p)2p!  (1—p?)? . 
(HinHir)~)(—pko  -PF 4 _ ©) 
m(n —t) n—t 


The effect of the residuals e, (cf. equation (14)) is, therefore, only to raise the variance of 
Hr, since p > 0 here, while H?r, and H?r, are still asymptotically uncorrelated when t +7. 
The proportionate increase in the variance of H?r, is less than 1/m and hence tends to zero, 
as it must do, when m—> oo. 

Working on the same lines for the Poisson—Markoff model for N, in two contiguous equal 
regions, we get results similar to those in the one-variate case, namely, only the variances 
of the Q’s (the linear combinations of the serial correlation coefficients for the G-test) are 
increased and the simultaneous covariances of the Q’s are altered, both the proportionate 
increments being less than 1/m,. This indicates that similar results may be expected for the 
k-variate Poisson—Markoff model for N, in general. 

Similarly, the effect of the dependence of the residuals on the H-test can be determined 
and shown to be negligible for large mp. 


2-3. Results of the G-test applied to spermatozoa data. The data consist of the series of 
observed number of spermatozoa in 20 equal squares} forming a 4h x 5h rectangle, observed 


+ Actually the twenty subregions are rectangles 63-4 x 67-8 y. 


4 Biom. 44 








50 Poisson—Markoff model for density fluctuations 


at regular intervals of time, in film 186 of Lord Rothschild’s data of bull spermatozoa. (For 
detailed description of the data see Rothschild (19536).) For the purpose of the test, the 
twenty squares are divided into four groups of five squares each forming four rectangles in 
a line and the observations in the five squares of each of the groups are pooled. The interval 
of time between consecutive observations is taken as the unit of time. 

For this case, 


Cys 2.8 
roe in, “w «t 
~" 0-1 ¥ -1F 
i a oe | 
Equation (3) of § 2-1, viz. 
(N,—m)— P(1)’(N,_,—m) = €, (10) 


where e€, is for large m, a normal residual vector distributed independently of €,, and 
P(1) = O,e-*:0,, where ©, and K, are as defined in § 1-2, is transformed to 


A,—e*s A,_, = 1, (11) 


where 0,(N,—m) = A, and y, = O,€,. yn, and yy (f+t’) are now asymptotically indepen- 
dently distributed. Let the vector A, have A,, B,, C,and D, as its elements. The correlation 
coefficients between A,, A,,, and A,, B,,, are respectively denoted by p, 4(7), P4,(7) and 
similar notations are used for the other auto- and cross-correlations. It can easily be seen 
that the cross-correlations are zero and p, 4(7) = e~*:”. The corresponding serial auto- and 
cross-correlations are denoted by 7, 4(7), 7.4,(7), etc., and the expressions used for them are, 
e.g. aw 


x (a,—4) (b,,,- b)/(n ~f¥) 


r4p(T)= i a TAR ame, 
J| x (a,- 7)?/n > (,-B/n| 
t=1 t=1 


where a, and b, are the observed values corresponding to A, and B, respectively, 


> 


n " 
a@= dYaj/n and b= 
t=1 
For the data, n = 100. The x's for the G-test are 


x2 ae x [ie 1) = 1m) Pmm( mr) 
, landm L (1 —Poam(1))/n 
and x#7= > at? —"imlt = 1) (Pa) + Pmm(1)) + Tilt = "eat Pal) (t¢>1) 
lends (1 —p3(1)) (1 — p3an(1))/n : 

where the summations are over / and m, / and m becoming A, B, C and D. 

The observed mean densities for the four rectangles are 32-17, 28-94, 31-35, 36-37, the 
interval of time between consecutive counts is 0-4 sec. and h ~ 65-5. 

The values of x? and y3 are given in the first row of Table 1, the estimate used for p=e~ 


7 r= [Z{Ay/var (A,)}/S{1/var(Ay)}] pan (12) 
where the summation is over A, B, C and D, 
Ag = (7 44(1))* 
1 


and 87! = 32—2cos,, 8g! = 12 —2 cosy, 8g! = 12—2 cosy, sp! = 12—2 cos a,, w, being 
equal to r7/5. It may be noted that 8, = A/k,, 8g = A/Ks, ete. 





Po | 


ee | 


-_ —- pm A--_ 


a de ee ee ee 


0) 
id 


ng 





V. T. Pata 51 


Some empirical values of p were also used to calculate y? and y3 and in rows 2 and 3 in 
Table 1 are given the values of x? and x3 for tuose values of p for which x? + x3 appeared to 
be minimum. (For individual correlations see Table 4 given in Appendix 3.) The y?’s are 


highly significant, indicating a departure of the data from the four-variate Poisson— 
Markoff model. 














Table 1 
Film 186 xi xe Nit Xe 

ee Sees ee cee! 2 sane 

p =r = 0-9208 as obtained from equation (12) 76 310 386 
D.F. | 15 16 31 
| p= and for which entry in col. 3 appears 105 71 176 
| = 0-84 to be minimum 94 | 81 | 175 
| D.F. 15 | 16 31 
| | 








The G-test in the one-variate case. The data used for the one-variate case are the five in- 
dependent series of observations of spermatozoa counts in squares (film 191, shot 1, shot 3, 
shot 4, shot 5, shot 6 of Lord Rothschild’s data of bull spermatozoa). The G-test is applied 
to each of the series separately. m is estimated by the sample mean and p, either by (i)r, 
the serial correlation coefficient, or by (ii) 


n 
X (m—m_3)?/(n— 1) 
r=) -—'=? 


n 
2> n/n 
t- 


-- 


It should be noticed that the variances of 7 and 7’ are 


l = 
var (r)~= (1p) +0(-—*) 





—p)? - 
and var (r')~ 9 Ot p) +6) 


(Lindley, 1954; Ruben & Rothschild, 1955). Hence when p>./2—1, var (r’) < var (r) and 


therefore r’ is a more efficient estimate of p than r. 


The values of (1p) 
n 


6 6 
vad = d Hin] 
t=2 t=2 

are given in Table 2 for the two cases (1) and (2). When r is used as the estimate of p, the 
x?-test does not indicate a significant departure of the data from the Poisson—Markoff 
model, whereas it does so when 7’ is used as the estimate of p. This should not be the case if 
the Poisson—Markoff hypothesis is true, since the errors of both the estimates r and r’ are 
O(n-4). On the other hand, if the Poisson—Markoff hypothesis is not valid, whether the 
discrepancy is brought out or not may depend upon the estimate used. To examine this 
question we may test whether the sample variance/mean ratio, (C,/7), which is asymptotically 


+ In both the uniregional and the multiregional cases the m’s are fairly large and hence the effect 
of finite m on the tests has been neglected. 


4-2 








52 Poisson—Markoff model for density fluctuations 


normally distributed, differs significantly from unity, the theoretical value assumed in the 
use of r’. On the hypothesis of the Poisson—Markoff model 



































n n1l— 
C, 2 (1 < (13) 
and var (2) ~- (753) ‘ 
n} n\l—p 
Table 2 
Be aie en | | ay 
Film 191 | Shot 1 | Shot 3 Shot 4 | Shot 5 Shot 6 | Total | 
Estimate of p =r | 0-5426 0-3755 0-5606 | 0-4038 0-4313 
x? | 4-1 1-6 1-1 4-1 3-3 14-2 
Estimate of p = 1’ 0-7644 0-6373 0-6596 0-6943 0-7714 
x? | 17 =| 4 1-9 16-2 30-1 =| 823 | 
DF 5 | 6 5 5 5 | 25 
SES ee ee I —|——_—___|_——_- = cal 
m=sample mean | 24-9911 | 16-0446 15-8850 | 12-8889 20-7719 
i] 
n=no. of observations | 112 | 112 113 | 117 114 | 
T= interval of time | 02279 | 0-2288 0-2250 0-2205 0-2248 | 
between consecutive | | | | 
observations in sec. | | 
Table 3 
| | 5 id rt con! | 
Film 191 | Shot 1 Shot3 | Shot4 | Shot5 | Shot 6 | N5C | to | 
i Tin Ales (ESS Oe i ee Oe , 
| | | | 
Cc -1:3 | | 33 — 6-2 


ay 1 «ts | att | «tt 


' 








C=A.M. of the C’s for the five series. tg=,/5 0/,/[£(C —C)?/4]. 


This follows from 








= im. m(l+p 
E(7) =m, Val @m~™ (774), 
l Be 2m?1+p? ml+p 
E(C,) m(1 ial var (C,) ~ — i-ftai-p’ 


Cov (Op, 71) ~~ 


and using the method of statistical differentials to obtain H(C,/7%) and var (C,/n). Table 3 
gives the values of 
¢ (6-1 oar 
var (C,/%) n nl—r’}}\n1l—r? 
for the five series. The variate /5C is asymptotically normal with mean zero and unit 
variance, while ¢,, is a t-variate with 4d.f. Both are here significantly large, bringing out 








the 


13) 





a 3 


nit 
rut 





V. T. Pati 53 


a significant deviation of the data from the one-variate Poisson—Markoff model with 
particular reference to the variance mean ratio. 

The Poisson—Markoff model, therefore, does not appear to give a good fit either to the 
multiregional or to the uniregional bull spermatozoa data considered here. In the uni- 
regional case the evidence for the departure of the data from the Poisson—Markoff model is 
not as striking as in the multiregional case, since the value of the pooled x?’s for the five 
series is not significantly large when the serial correlation coefficient is used as an estimate 
of p.+ 

Since the Poisson—Markoff model is consistent only over small total intervals of time, it 
is desirable to specify the stochastic motion fully and then to base the stochastic models 
for N, on it so that these models would be free from contradictions. The stochastic models 
for N, are, in general, not Markovian (Bartlett, 1954). However, the closeness of approxi- 
mation of the Poisson—Markoff model to such consistent models may then be studied, and 
the Poisson—Markoff model, which is comparatively simple, can be used as an approximate 
model in place of the original one, if the closeness of the approximation is found adequate 
for the particular situation under consideration. An investigation of some specific models 
for N, based on fully specified stochastic motion has been carried out and it is hoped to 
publish an account of this work in due course. 


I am extremely grateful to Prof. M.S. Bartlett for his valuable guidance. I am indebted 
to Lord Rothschild for making the spermatozoa data available to me; to Lord Rothschild 
and Dr H. Ruben and to Mr D. V. Lindley for allowing me to read the draft of their papers 
before their publication. I thank the referees for detailed suggestions regarding the first 
and second parts. 


APPENDIX 1 


Proof of formula (2), § 1-3 


The infinitesimal probabilities of transition from one state to the others of the process {N,} are 
expressed in tabular form below. Here we have written n, for N,,,. 


Transition from state at Probability of 
time ¢ to state at time t+ dt the transition 
Np >Np+1 pdt + o(dt) 
Np >N,—1 n,AX* dt + o(dt) 
sa a 1, Aggt + 0( dt) 
N,>n,—1 

all other transitions from o(dt) 

one state to a different state 

Hence no change 1—Da,dt-—D (AF + D A,s) n,-dt — o(dt). 
r ad s+r 


Then the partial differential equation for Il =II,(z), the p.g.f. of N,, obtained as in Bartlett (1949) by 
taking into consideration all these possible transitions is 
all all 
“Oe TTS orl2p— 1) + Da {Ds Avalte— 27) + Ar (1 — 2) (1) 
r if 


at “r 8+r 


{ The results for this particular test, which was made first, were quoted by Prof. Bartlett in his 
book An Introduction to Stochastic Processes (1955, p. 274); the other results had not then been 
obtained. 








54 Poisson—Markoff model for density fluctuations 


The auxiliary equations of this are 


dt _ = all (2) 
wn ei Aya(Zs— 2s— 2p) +AR(L- —2,) “i Dalen 
s+r 
With the matrix A as given in § 1-2, let the vector m/’= (m,, mg, ...,m;,) be defined by 
p’ =m ’A. (3) 
(It should be noted that | A| =x,k,...K,+0; see line 4 from bottom of page.) 
For any linearly transformed variate «’(z— 1), equations (2) give 
dil _ —da’(z—1) (4) 
IIp(z—1) —a’A(z—1) 
dt —da’'(z-1 
and Oe eee (5) 


1 —a’A(z—1)° 

First take a in (4) as identical with m, and then, with the use of (3), we get 
all _ dm’ (z—1) 

6 

¥* i (6) 


so that I] e-™@-» = K,, where K, is any arbitrary constant. Next take a’ as identical with n/, the latent 
row vector of A corresponding to the latent root «,, and equation (5) becomes 





dt _ —dn,(z—1) 


= . @ = I,...,4), (7) 
1 — Kp N(Z— 1) 
whence n;(z— 1) e-*r' = constant. 
The general solution of equation (1) is therefore 
II,(z) e~™@-) = p(ni(z—1)e-*1',  n3(z—1) e-K2t, ...,  (z—1) e-*x*), (8) 


where y is an arbitrary function, the form of which may be determined from the initial conditions. If 
initially the numbers of particles in the regions R,,...,R),, are n,,...,n, respectively, then taking the 
initial instant to correspond to ¢ = 0, 


k 
II ,(z) = J] 27°. 
r=1 

Hence, putting ¢ = 0 in (8), 

k 

I] zrre“™@ = o(ni(z— 1), ..., ne(z—1)). (9) 
Now with © defined as in § 1-2 let the vector y be defined by 

@(z—1)=y _ or equivalently z—1=0-'y (10) 


and let §;, the row vector of @-! corresponding to (z,— 1). Then equation (9) can be written as 


WY rs -++s Yu) = e-FEerme Ih (1+Efy)"r. (11) 


With this expression for y, equation (8) becomes 


k 
I1,(z | n)e-™*-» = exp{— YS m[E,4(ni(z — 1) e-*s*) ]} il {1 + DEr(n(2— 1) e~Xst)]" 
er 


= exp{—[m/O-! eK Q(z pT +e yy", (12) 
where G; is the rth row vector of the matrix 
P(t)=O-' e-K9, 


It can be shown by expanding A in terms of A#’s in the diagonal elements and employing the method of 
mathematical induction that all the principal minors of A are positive-definite. Hence all the latent 
roots of A are positive (cf. Bartlett, 1949). Therefore when ¢ > 0, then 


e-*s'-+0, for all s, 
and (12) gives immediately 


IT., (2 |) = em. (13) 





(3) 


(4) 


ont 


10) 





V. T. Pati 55 


This probability distribution is independent of the initial value of N,, i.e. value of N, at t = 0. Hence 
for all ¢ and in particular for t = 0 we take, 
II, (z) = e@@-». (14) 


From formulae (12) and (14), expression (2) § 1-2 for the p.g.f. of N, and N,,, may be obtained. 


APPENDIX 2 
The proof that in two dimensions the collision frequency is m,C/7; also the proof of formulae (4) § 1-3 
Let ¢ be the vector “) representing the velocity of a particle in two dimensions, and /(c), the prob- 


vy 
ability density for the velocity. Consider the number N of particles passing in time dé from one side of 
an infinitesimal element of length ds to the other. Then it is easily seen that 


N =| m dsc, dtf(c) de, (1) 
cy>0 


where the y-axis is taken along the normal to ds, the positive direction pointing towards that side of ds 
to which the particles move. On transforming to polar coordinates c and 0 given by c, = ccos6, 
c, = csin @, and integrating over #, equation (1) becomes 





m,dsdt [@ c 
N=— ] ef(c)de = —-° dedt. 
7 0 7 
Hence the number of particles crossing a unit length in unit time is given by 
mee (2) 
7 
Therefore m,P,<(dt) = “er Bat. 
7 
But with the assumption of a uniform mean density 7, 
Mm, = M,A,. (3) 
cB 
Hence P,,(at) = —* g, (4) 
7A, 
cL 
Similarly P¥(dt) = — dt (5) 
TA, 


Noting that in a state of macroscopic equilibrium the mean number of particles coming into a region 
R, from R* and the other regions in time dt balance the mean number of particles going out of R, to R* 
and the other regions in the same time, and using the equations (3), we obtain the relation 


p’ = m’A. (6) 


APPENDIX 3 
Tables of serial and cross-correlations 


Table 4. Values of r(r) for T = 0, 1, 2, for the series A, (defined in § 2-3) 











Film 186 ro) T1) 12) Film 186 ro) 11) 9) 

| 

— . . ses = 

| | 

| TAA | 1-0000 0-7207 | ~— 0-6517 TCA — 0-2688 —0:2506 | —0-2551 
| rap | — 0-5220 —0-5124 — 0-4939 rCB 04445 | 04381 | 04457 
| rae — 0-2688 — 0-3259 — 0-2836 rec 10000 | 08914 | —0-8461 
| "ap —0:4906 | —0-4987 —0-4619 rcp | 02682 | (02617 0-2864 
| Tea | —0-5220 | —0-4846 | —0-5877 ‘pa | —0-4906 — 0-5356 | — 05269 
| BB 10000 | = 08265 0-7316 rppn_ | ~=—-05538_ | (0-55OB | (05279 
| rec | 0-4445 04486 0-4509 rpc | «02682 | 0-2389 | 0-2319 
| Tpp | 05538 | 05801 0-5453 rpp | 1-0000 | 0-8725 | 0-8102 














56 Poisson—Markoff model for density fluctuations 


Table 5. Values of r,, fort = 1,2, ...,6, for the five series of observations 


(Film 191, shot 1, shot 3, shot 4, shot 5, shot 6 of Rothschild’s data) 

















| | | 

| | ry 12 | Ts | "% ’s | 6 

== | aes |— aes. ae i 3, 7 Lanes a aes 5 
Shot 1 0-5426 | 0-3109 | 0-1976 | 0-0588 0-0560 | — 0-0608 
Shot 3 | 03755 | 01013 | 00784 | 0-1050 0-0258 | 0-0206 
Shot 4 | 05606 | 0-3280 |  0-2026 | 0-1806 0-1235 | 00-1096 
Shot 5 | 04938 | 00-2055 — 0-0041 } — 0-1248 —0-:1107 | — 0-0043 
Shot 6 0-4313 | 0-1976 0-1617 | — 0:0033 — 0:0520 | 0-0024 

| | 
REFERENCES 


AITKEN, A. C. (1951). Determinants and Matrices (University Mathematical Texts). Edinburgh. 

Barrett, M. 8. (1949). Some evolutionary stochastic processes. J.R. Statist. Soc. B, 11, 211. 

Bart ett, M. S. (1954). Processus stochastiques ponctuels. Ann. Inst. Poincaré, 14, 35. 

Barttett, M. S. (1955). An Introduction to Stochastic Processes. Cambridge University Press. 

Bartiett, M.S. & RasALAKSHMAN, D. V. (1953). Goodness of fit tests for simultaneous autoregressive 
schemes. J.R. Statist. Soc. B, 15, 107. 

CHANDRASEKHAR, 8. (1943). Stochastic problems in Physics and Astronomy. Rev. Mod. Phys. 15, 1. 

Linptey, D. V. (1954). The estimation of the velocity distribution from counts. (Paper read at the 
Amsterdam Conference, 1954 and to be published in the Proceedings of the Conference.) 

Monrroit, E. W. (1947). On the theory of Markoff chains. Ann. Math. Statist. 18, 18. 

QUENOUILLE, M. H. (1947). A large sample test for the goodness of fit of autoregressive schemes. 
J.R. Statist. Soc. 110, 123. 

RoruscuiLp, Lorp (1953a). A new method of measuring sperm speed. Nature, Lond., 171, 512. 

RoruscuiLp, Lorp (19536). A new method of measuring the activity of spermatozoa. J. Exp. Biol. 
30, 178. 

RuBEN, H. & Rotuscuitp, Lorp (1955). Estimation of mean speeds of organisms and particles by 
counting (to be published). 








el © eee 





ve 


ol. 


by 





a a 





[ 57 ] 


TESTING FOR SERIAL CORRELATION IN LEAST 
SQUARES REGRESSION 


By E. J. HANNAN 


Australian National University, Canberra, A.C.T. 


1. INTRODUCTION 


We shall be concerned with a regression of the form 


Yi Uy, Uy «- MV fi €y 
Yeh =] %12 se trol | Po +  & (1) 
Yn Lyn Ven + Len By, En 

or y = X8+e, 


where the é, are generated by a stationary process, wholly independent of the vectors in X, 
and have correlation matrix I. 

In a fundamental paper Anderson (1948) considered the case where the ¢, were generated 
by certain Gaussian processes giving a joint distribution near to that for a stationary simple 
Markoff process. For the cases he considered, when the column vectors of X are latent 
vectors of I’, a uniformly most powerful (one-sided) test ot the hypothesis of serial indepen- 
dence of the ¢, may be obtained from the ratio of a quadratic form in the residuals tothe sum 
of squares of the residuals. The matrix of the quadratic form in the numerator also has 
the vectors in X as latent vectors and is close to the matrix occurring in the definition of the 
serial correlation coefficient. His results suggest the use of some form of the serial corre- 
lation coefficient as a test statistic in the general case when the regressor vectors are no 
longer latent vectors of ©. Durbin & Watson (1950, 1951) use, for example, the statistic 





qu u hat 
ye 
where z = [I—-X(X’X)1 X’]y = Qy 
rT J —-1l : 
—1 2 —1 
and A, = ei . (2) 
_— 
-1 1 








The elements of A, outside the principal diagonal, and the diagonals immediately above 


and below it, are zeros. 


Durbin & Watson reduced d to the canonical form 


d= 


n—k ‘ 
~ V5 65 











58 Serial correlation in least squares regression 


where the v; are the latent roots of QA, (other than k zeros) and the ¢; are independent 
standard normal variates with zero mean when the ¢, are independent normal variates. 
They showed, moreover, that, ordering the v; with respect to size 


Aj SV; S Aji Ks 


where the A; are the latent roots of A, (also ordered with respect to size) other than s which 
correspond to regressor vectors which are latent vectors of Aj. 
They then tabulated, for various n and k, the significance points of 


n—k n—k 
>» Ay G >> Aira G 
d,=+=—- and d,=—-—,—, (3) 
, ~ G x 
1 


which correspond to the case where s = 1, and the only latent vector is that vector composed 
entirely of units, which corresponds to the constant term in the regression. These significance 
points then provide bounds to the significance point for d. 

In general the v; depend upon the regressor vectors, the only simple case being, again, 
that where the regressors are latent vectors of A,. In cases where the bounds test is in- 
conclusive they recommend the use of }d as a Beta variate with the appropriate mean and 
variance. 

An alternative test was suggested in Hannan (1955), but, while this test is asymptotically 
fully efficient, when n is small and & large relative to n end-effects reduce its power very 
greatly. In these circumstances the bounds test is also unsatisfactory however, since the 
bounds are very far apart. For example, with n = 30 and k = 6 the bounds are 1-07 and 
1-83, although the range of variation of d is only from 0 to 4 (approximately). 

As has already been seen, the latent vector case is of primary importance in connexion 
with the test for serial independence of the residuals. It is also of great importance in the 
estimation of 8B, for when X is composed of latent vectors of T it is well known that the 
straightforward least squares (L.S.) estimates are, numerically, the same as the best linear 
unbiased (B.L.U.) estimates (obtained whenT is known). Grenander (1954) and Grenander 
& Rosenblatt (1954) have investigated the conditions under which these two differing 
estimation procedures give, asymptotically, estimates having the same covariance matrix, 
no matter what the stationary process generating the ¢, may be. Their conditions, which 
will be given in more detail in the next section, are met by regressions on certain analytic 
functions (for equidistant values of the argument), including the orthogonal (Legendre) 
polynomials, the trigonometric functions and functions of the form ¢? cos #, t” sin 6t. The 
analogy with the latent vector case suggests that when Grenander & Rosenblatt’s theorem 
applies the distribution theory of statistics such as d can be simplified and this is investigated 
in § 2. 

Most attention will be paid to the orthogonai polynomials, since the regression on trigo- 
nometric functions has been dealt with exactly by Anderson & Anderson (1950) and other 
cases where Grenander & Rosenblatt’s theorem holds seem less interesting. It should be 
mentioned that much of what is contained in §3 (relating to the orthogonal polynomials) 
is already implicit in Durbin & Watson’s work. (See, for example, Durbin & Watson (1951), 
pp. 171 and 172.) 

In the final section of the paper the problem of testing for serial correlation in the residuals 
from a regression which includes trend terms as well as stochastic variables is considered. 


aA 


—— eet CFO OS 


Vy 


lent 
tes. 


rich 


(3) 


sed. 
unce 


ain, 
3 in- 
and 


rally 
very 
» the 

and 


xion 
1 the 
5 the 
near 
nder 
ring 
trix, 
Thich 
lytic 
dre) 

The 
orem 
rated 


rigo- 
other 
ld be 
rials) 
951), 


duals 
red. 


E. J. HANNAN 59 


2. REGRESSORS WHOSE SPECTRAL DISTRIBUTION FUNCTION IS A STEP FUNCTION 


Grenander (1954) and Grenander & Rosenblatt (1954) consider the case of regression on k 
vectors (of n elements each), @,, do, ..., P,, generated by a process such that 








() Jim 3 gH) = 00 (w= 1,...k), | 
- PiAt) 
(ii) lim * =i (a= 1....,&), 
? 3 he (4) 
34, dibtiie (¢) 
(iii) lim — 7. r,,(h), exists (u,v =1,...,k). 
(Zand ao] | 


The ¢,,(t) are considered merely as known sequences of real numbers, no other restrictions 
on the process generating them being made. 
Grenander & Rosenblatt then show that 


Ry = [r,o(h)] = |" aM, (5) 


where M(9) = [m,,,(@)] is a matrix of functions with M(@,)—M(9@,) (Hermitian) non- 
negative definite for 0,>06,. It is presumed that Ry = M(z)—M(—z) is non-singular 
(to avoid a kind of asymptotic Fey M(Q) is called the spectral distribution 
function of the @,. Since the ¢,(¢) are real we shall have dM(@) = dM(—0). Introducing 
the (matrix) function N(#) = MO +)-—M(-—6-—) (which is real and symmetric) the regres- 
sion spectrum S is defined as the set of points in 0 to 7 for which dN(@) is not the null matrix. 
It is then shown by Grenander & Rosenblatt that the necessary and sufficient condition 
that the L.s. and B.L.U. estimates of B should have, asymptotically, the same covariance 


matrix is that S should consist of q< k points of increase 6,, 95, ..., 9, for which 
dN(9;) N(7)-*dN(9;) = null matrix (7 + J) | 
oe (6) 
= dN(O,) (i =9).] 
If we now put T(9;) = N(7)* dN(4;), 
we shall have E=T(0,) = Nin) | "AN(6) = I, 
i 0 


and it follows from (6) that the T(4;) are idempotent. 

Grenander & Rosenblatt’s theorem is proved under certain regularity conditions on the 
spectral furiction of the process generating the ¢,. In particular, the spectral density must 
not have a zero at any of the points 0;. An example where the failure of this last condition 
makes the theorem, quoted above, invalid is given in Grenander (1951), p. 568). 

To apply this result to the problem of testing for serial correlation in the residuals (when 
on the null hypothesis the ¢, are independent normal variates with zero mean) let us con- 
sider the case where the test statistic is vis 


, > 


ZZ 











60 Serial correlation in least squares regression 


z being given by (2) and A = [a,_,] by 
: [* ; 
Oy» = 35 J (9) 0-8, (7) 


where g(@) is a bounded (integrable) even function. 
In practice the matrix A will be such that {g(A)}* (s a positive integer) may be expanded 
in a convergent Fourier series. For example, A, is near to a matrix for which 


g(9) = 2(1—cos6@). 


Then we shall have a), = — sal {g(A)\8 et)? dd, 

where a‘, is the element in the th row and vth column of the sth power of the doubly 
infinite matrix obtained from A by allowing and v to run from —0o to 00. Moreover, by 
a theorem due to Szego (see Grenander, 1951a) it follows that the latent roots of A are 
asymptotically the equidistributed ordinates of g(@), in the sense that 


an, * Qs (1) 4A8(2) +... FAS (m =a. {g(0)}*d0, (8) 


n> wo 2 


where the A,,(j) are the latent roots of A and lie in the range of variation of g(0). 
Consider the quantit: 
ew PAG, 


(p'.h,.- G.d,}1 
where the @, satisfy the conditions of Grenander & Rosenblatt’s theorem. This is easily 
seen to converge, as ” increases, to 
x an T wy 3 m). 
We then have, from (5), 
lim $), Ad, := 
no {Pi h,- PP} 








= Za, [ e-im? dm, (8) 
= [" o-0)dm,.(0 
“s [7010)an,.(0) 


a . 
= 529s) NuAJ)» 


where 7,,(j) is the jump in ,,(9) = m,,(7) —m,,(—9) at 0 = 0;. 
In the same way it may be shown that we shall also have 


$,A’h, 
li .— 7 
n> (P,P, Pid,} = 30103, w(d). (®) 


As Durbin & Watson (1950) show, the moments of r may be obtained from the quantities 
xvi = Trace (QA*) 
(see also Whittle, 1951). Here the v, are the latent roots of QA while Q was defined in (2). 





(7) 


nded 


ubly 
r, by 
k are 


(8) 


asily 





E. J. HANNAN 61 
We may without loss of generality take the vectors x; in (1) to be orthonormal. We 


then require 
Tr[A— > x;x;AF. 
i 


For ease of printing in what follows it will be convenient to alter our notation, putting 
x(t) for x;, n(“,v,j) for n,,(j), 9(j) for g(A;) and T(j) for T(A;). 

We may expand the matrix product just written and evaluate the trace as the sum of a 
number of traces. Remembering that the trace of the product of two matrices is iidependent 
of the order of their multiplication we may consider the typical term in this sum as 


Tr {A> x(7) x(t)’ A)? AS... (x(t) x(z)’ A), 
where a+6+c+...+f=s. Putting the sum of the powers of the terms =x(7) x(7)’ A equal 
to t<s we may write this as 
X{x(i,)’ A*+4x(7,)} {x(i,)’ Ax(i,)} ... {x(a,)’ A°*4X(d,,1)} ... {K(G_1)’ Ax(é,)}, 


where the summation is over all i,, i,, ..., i, running from 1 to k. In the limit, if the x(7) 
satisfy the conditions of Grenander & Rosenblatt’s theorem, we have from (9) 


q . . . . qd . . . . qd . . . qd . . . . 
¥{ Soperininiss ad} { Sotsmirias A} | SoG mivioasd}.-{ admis}. 
j=1 j=1 j=1 j=1 
Changing the order of summation we get 
L{O(Jr)7** (ja) ++ PJs)? --- G(Ie) Diy, 15 Fr) Mt, te; Jo) --+ U ty, ty415 Joga) +++ Mh a Us jv} 


where the outer summation is over jj, jo, ...,,j, running from 1 to q and the inner one is over 
t;, tg, ..., , running from 1 to k. But the inner summation is seen to be 


Tr T(j.) T(je) 2+ T(jy), 


since N(z) = R(0) =I 
=0 unless ji = js a cco. de 
= Tr T(j) if hy @ jem. BJ =i 


(from (6) and the fact that the T(j) are idempotent). Hence 


lim Tr {A“(E, x(3) x(i)' AY?A®... (Ex(i) x(i)’ AY} = Soe Te TU) 


n—>o 


and is independent of the partition of s into integers a, b, c, ..., f provided t > 0. 
Reverting to our original notation we then have 


lim Tr [A — UX; x, A} 


n—->o@ 


coe on eels 8 ' 
= TrA + aI ) (') 1) Tr T(6;) 


= TrA’— S(O, Tr T(0,). (10) 
j=l 








62 Serial correlation in least squares regression 


It is easy to see that Tr T(@;) is invariant under the orthonormalization so that this 
formula is perfectly general. In fact under the orthonormalization the T(@;) = N(z)-1dN(@;) 
will be replaced by P’dN(0;) P, where 


P’N(z)P = 1. 


It can be easily seen that the P’dN(@;)P annihilate each other and are idempotent. 
They are also symmetric and non-negative definite, however, so that each must have 
units in certain places in the principal diagonal and zeros elsewhere, no two having unity 
in the same place. (This fact could be used to obtain an alternative proof of (10).) It follows 
therefore that 


Tr P’dN(O;) P = Tr N(z)" dN(@;) = Tr T(9;) = p; + (p,; integral), 
where Sp, = k. 
1 
We then have, under the conditions of Grenander & Rosenblatt’s theorem, 


lim {n— Tr (QA)*} = nT At Sp, (9 a,)\ 


n-@ | 
- mf 9(0)' a0 37,910) 


from (8), and it can be seen that, asymptotically, the distribution of r is the same as that 
which would obtain if the regression vectors had been latent vectors corresponding to latent 
roots g(9;), repeated p, times. As n increases, of course, g(9;) will be arbitrarily rear to p, 
latent roots of A. 
For example, if 
Lq;_,(t) = sin mast | Lo;(t) = > cos” 27931 | (7 = 1,...,q), 


\ n | 


2 
then R,, has cos (» =), repeated twice, in the principal diagonal and zeros elsewhere so 


that dT(@) is null except at 0 = 27g,/n, where it has two units in the principal diagonal and 
zeros elsewhere. If as A we choose A,, the matrix occurring in the definition of the circular 
serial correlation coefficient (see Anderson & Anderson, 1950), we obtain 


l qd ¢ 8 
lim (t —Tr(QA, yi | - ry At-> 2(c0s =m) : 
n—->o \n n | 1 Nn, | 
since g(?) now becomes cos @. For suitable g; the statistic r, = A,Z 





to the tables prepared by Anderson & Anderson. The validity of the approximation for 
small n would need to be checked, of course. The example is not very realistic however, and 
will not be considered further here. A more real case would be that where p is allowed to 
run through the integers from 0 to k (say). Now, however, we shall have 


’ 


lim 7 Tr(QA,)' ‘ AT At—(k+1)¥ 2(c0s nti)’ 


n> 1 





and this does not correspond to any statistic whose significance points are at present 
tabulated. 


this 


(95) 


ent. 
lave 
nity 
lows 


that 
tent 


O Pp; 


€ so 


and 
ular 


sent 


E. J. HANNAN 63 


3. REGRESSION ON THE ORTHOGONAL POLYNOMIALS 


Here we shall use the matrix A,. Though A, does not satisfy (7) it is very near to a matrix 
which does and the difference will not matter asymptotically. We now have, in the limit, 
g(9) = 2(1—cos@). For the orthogonal polynomials there is only one point of increase in 
N(@), at the origin, for which T(0) is the unit matrix so that 


Tr[QA,}*—> Tr A’ — kg(0) = Tr AS. 


Since d is necessarily less than d,, (see (3)) and the roots A,, Ag, ..., A,_; all tend to zero as 
n—> 00, it appears that the effect of the regression is to eliminate the k smallest roots from the 
spectrum of A, and the appropriate significance point is, asymptotically, that given by the 
upper bound to the significance point as tabulated by Durbin & Watson (1951). (Note 
that in their tables they put k’ = k—1.) 

To check on the adequacy of the approximation for n small the mean yw and variance 
o* of d and d, have been calculated for certain n and k’ (= k—1) and are shown in Table 1. 

















Table 1 
ie | 
7" | 1 3 | 5 
| 
Queen 662s pea Be 
u A ht | o ia | o ia o 
| | | | | 
15 {(¢ | 2-150 0-225 2-474 0-194 2-798 | 0-145 
d, | 2150 | 0224 2-491 0-190 2-860 | 0-137 
20 {i 2110 | 0-177 2-345 | 0-164 2-578 | 0-147 
d, | 2-110 0-177 2-354 0-162 2-620 | 0-137 
25 . 2-086 0-146 2-271 | 0-139 | 2-456 | 0-130 
d, | 2086 | 0-146 2-275 0-138 2-481 0-124 
30 e | 2-071 0-124 | 2-222 | 0-120 2-376 | 0-114 
d,, | 2-071 0-124 | 2-224 | 0-119 2-392 | 0-111 











It appears that the true significance point will not differ very greatly from the significance 
point for d,. Even in the case n = 15, k’ = 5 the deviation will be only about 0-06. (Here 
the 5% bounds are 0-56 and 2-21.) Having regard tc the certainty of deviations from 
normality and the fact that critical decisions will rarely be made when the observed value 
falls very near to the significance point, it seems that the upper bound to the significance 
point as tabulated by Durbin & Watson may nearly always be used as an adequate approxi- 
mation to the true significance point. When the observed value is very near to this upper 
bound the significance point could always be located more accurately by using jd as a 
Beta variate, as suggested by Durbin & Watson. 

The work of Durbin & Watson showed, of course, that the bounds to the significance point 
differed from the true significance point by quantities of the order n-!. What we have here 
done is to show that in the case of the orthogonal polynomials the upper bounds differ from 
the true significance points by quantities of the order n-?. 








64 Serial correlation in least squares regression 


4. MIXED REGRESSION ON ORTHOGONAL POLYNOMIALS AND RANDOM VARIATES 


A situation which is likely to arise in practice is that in which a regression is to be computed 
and all the variates contain a trend component. A standard procedure is then to include the 
orthogonal polynomials in the regression up to the order needed to eliminate the most 
complicated trend. This will often result in a regression on a large number of regressors, so 
that the bounds test for the serial correlation of the residuals will be inconclusive. For an 
example see Quenouille (1952, pp. 182-6). 

Consider the case of a regression on k vectors ; satisfying the conditions of Grenander’s 
theorem and p vectors x; which are unrestricted. We may, without loss of generality, take 
the whole set $;, x; as orthonormal. Then if the non-zero latent roots of 


are Py fas +++» Pn—k—p» 
k 

while those of I! -> 49; ¢;| A are 
5 eo. ae 

Durbin & Watson (1950) show that, when each set is numbered so that its members are 

increasing, 


‘ 


z' AZ. 
Thus a lower bound to r = a8 





n—k—p 
x 4G 
2) Ee zs (11) 
"a 
We have from (10) 
n—k—p q p-1 
lim D v= TrAs— > {g(,)}TrT(6,)— 5 v4; 
n—>o i=1 1 j=0 
n qa p-1 
= BA; = ‘ {9(9;)}8 Tr T(;) = x Vo-k- 
‘a 


But from the fact that ASYM S Ape 


p-1 p-1 
it is clear that > v,_,_, will differ from © A,_; by a quantity of order n—' so that to our order 
j=0 0 


of approximation to the moments of 7, we may put 


n p 


org n—p q 
x v= x Aj- ~ {9(9;)}° Tr T(G;). 
In the case where Tr T(@;) is the appropriate multiplicity of the root corresponding to 
g(9;) a lower bound to the significance point for r may therefore be obtained from 
»_ VAG 


sf VG ’ 


where in ~’ the terms corresponding to the latent roots other than the p greatest and those 
corresponding to the g(9;) appear. 





are 


(11) 


order 


ng to 


those 


E. J. HANNAN 65 


For the case of the orthogonal polynomials (using A,) this bound becomes 


n p 


—k— 
2 
= N54 aGF 


= (12) 








(the latent root, zero, corresponding to the constant term having been omitted, to accord 
with Durbin & Watson’s notation, as in (3)). 


n—k n—k 
This is not a true lower bound, for finite n since (for example) } v;< ¥ A;,,_;. However, 
1 
the considerations of the previous section indicate that the error involved in using it will 


be very small. Moreover, one can only be wrong in using it as a lower bound when the p 
vectors X; are near to the latent vectors corresponding to the largest latent roots of Ay. 
This implies, for example, high negative serial correlation and would probably be recognized 
before the analysis was commenced. However, this discussion is largely academic, since 
the significance points for d; in (12) are not tabulated. Until such time as the necessary work 


can be done an expedient (which will narrow the bounds, though not as much as d; will 
narrow them) is to use 





This is obtained from (12) by adding back the terms corresponding to the (k— 1) smallest 
latent roots (the kth vector in the set being the vector corresponding to the constant term). 
For the range of values of n, p and k occurring in practice this will provide a lower bound 
for d;.* The significance points for d7 are tabulated by Durbin & Watson (see (3)) and can 
be obtained from their tables for d, putting p = k’. 

We shall close this section with an example which originally suggested the problem. In 
Quenouille (1952, p. 183) an example is given of the regression of U.S. fertilizer consumption 
on an index of farm income over the years 1911-47. The two series contain a decided trend 
which appears to need a fifth degree polynomial for its removal. In this case, therefore, 
p=1,k-1=k' = 5and n = 37. In Table 2 the 5 % points for the four statistics d,, d7, d; 
and d,, are shown for these p, k and n. 


Table 2. 5% points 
d, a’ di d, 
1-14 1-42 1-75 1-88 


These significance points were obtained by using a Beta distribution with the appropriate 
mean and variance but will certainly not differ from the exact values by more than 0-02. 

Quenouille quotes the first serial correlation of the residuals as 0:229, which corresponds 
to a value of d slightly less than 1-54. Using Durbin & Watson’s original bounds no con- 
clusion may be reached. Using the conservative lower bound d; (obtained from Durbin & 
Watson’s tables) this is still so. However, using the bound given by d; it can unquestionably 


* For kn-!<} this will certainly be so. 


Biom. 44 





66 Serial correlation in least squares regression 


be said that the serial correlation is significant at the 5 % point (as Quenouille conjectured). 
In this case the random regressor is positively serially correlated and there can be no doubt 
of the validity of the result based on d;. 


I should like to thank Dr G. 8. Watson for some helpful discussion in connexion with this 
work, 


REFERENCES 


ANDERSON, T. W. (1948). Skand. AktuarTidksr. 31, 88. 

ANDERSON, R. L. & T. W. (1950). Ann. Math. Statist. 21, 59. 

Doursin, J. & Watson, G. 8. (1950). Biometrika, 37, 409. 

Dursin, J. & Watson, G. 8. (1951). Biometrika, 38, 159. 

GRENANDER, U. (195la). Ark. Mat. 1, 503. 

GRENANDER, U. (19516). Ark. Mat. 1, 555. 

GRENANDER, U. (1954). Ann. Math. Statist. 25, 252. 

GRENANDER, U. & RosEenBLaTT, M. (1954). Proc. Nat. Acad. Sci., Wash., 40, 812. 

Hannan, E. J. (1955). Biometrika, 42, 133. 

QUENOUILLE, M. H. (1952). Associated Measurements. London: Butterworth. 

Wairttez, P. (1951). Hypothesis Testing in Time Series Analysis. Uppsala: 
Almqvist and Wiksells. 





red). 
loubt 


h this 


i 


[ 67 ] 


ON THE ANALYSIS OF MULTIPLE REGRESSION IN k CATEGORIES 


By 8. KULLBACK ann H. M. ROSENBLATT* 
The George Washington University and Bureau of Ordnance, U.S. Navy Dept.+ 


SUMMARY. The general model, an information theoretic approach and solution to problems of test 
of hypotheses concerning sets of partial regression coefficients from k categories each involving p+ 1 
variates, is presented and applied to certain data. The significance test is the analysis of variance 
ratio. It is shown that Carter’s (1949) problem involving a ‘correlation effect’ among the ith members 
of each category reduces to a special case. The case of stochastic dependence among categories has been 
included by Kullback (1956) in his discussion of the multivariate linear hypothesis. 


1. INTRODUCTION 


We first consider the problem of the normal linear hypothesis (Kolodziejezyk, 1935) from 
the viewpoint of information theory, and then apply the results and procedure to the pro- 
blem of tests of hypotheses about sets of partial regression coefficients. It is believed that 
this approach has some merit, at least from the pedagogical point of view, and has served 
as the basis for teaching some aspects of Design of Experiments at the George Washington 
University. Although the general theory given has wider applicability, the specific results 
given here may serve some needs for k category multiple regression analysis. Aside from 
Carter’s (1949) paper, the k category multiple regression case does not seem to be covered 
in the literature, and has not been found in recent texts on statistical theory or method ~ 
(cf. Kempthorne, 1952; Kendall, 1946; Pearson & Wilks, 1933; Rao, 1952; Welch, 1935; 
Williams, 1953). Study of the problem by Rosenblatt (1953) was stimulated by the needs 
of an applied problem at the Naval Ordnance Laboratory, White Oak, Md. Complete 
results and a general theory are presented here. Matrix notation and theory are used and 
the results are illustrated by application to certain numerical data. Matrices will be denoted 
by upper case bold face type, e.g. A = (a,;), X, = (%4,;), ete. (¢ = 1,2,...,m; 7 = 1,2,..., 0), 
the vectors or 1 row or column matrices will be denoted by lower case bold face type,e.g. 
X! = (Wy, Xp). Xp), Wy = (Maas Mra «++» Map), ete. 


1-1. Information theory approach. Let f,(x) and f,(x) be the probability densities of 
populations specified by the statistical hypotheses H, and H,, respectively. The mean in- 
formation per observation from the population with probability density f,(x) for discrimina- 
tion for H, against H,, is defined byt 

1(1:2) = [fe x) log Ht Aaa, (1-1) 
where x can be taken as a vector variable (multivariate). The divergence between H, and H,, 
a measure of the difficulty of discriminating between them, is defined by} 


T,2) = [ife)-fle log te ae (1-2) 
*- Now with the Office of Naval Research, U.S. Navy Department, Washington, D.C. 


+ Presented 24 March 1955 at a Bureau of Ordnance Seminar of Statisticians held at the University 
of Chicago. 


t See Kullback & Leibler (1951) for the case of general populations. 


5-2 








68 Analysis of multiple regression in k categories 


For the special case when /f,(2) and f,(x) are p-variate normal densities, with the same 
matrix of variances and covariances, 2, and mean values 


Ey (x) = wy = (4 re: +++y flyp) and = £,(%) = wo = (a1; Mop: -++> lap)» 
it is found (Kullback, 1952) that (1-1) and (1-2) yield 


21(1:2) = J(1, 2) = (uy — pe)’ B'(p — Bo); (1-3) 


where the right-hand side of (1-3) is proportional to Mahalanobis’s generalized distance 
(see, for example, Rao, 1952). 


1-2. The linear hypothesis—single category. We now consider the set of n observations 
on p+1 variates z=y— XR, (1-4) 


where Z = (21, 2%, .--32q)> YY = (Yas Ye ---2 Yu)» 


RB’ = (f,, Po, ---,Bp), X= (ry) (0 = 1,2,...,0; 7 = 1, 2,....p; p<), 

such that 

(a) the z’s are independent, normally distributed variables with zero means and vari- 
ance o?, 

(b) the x;,’s are considered to be known, 

(c) X is of rank p, 

(d) B = 81 and B = B2 are parameter matrices whose values are respectively given by 
the re H, and H,, 

(e) E = X81 and £,(y) = XB2. 

It is rte that (1-3) yields - this case, 

J(1, 2) = (XB1— Xz)’ (o?1)-1 (XB1 — XBz2) 
= (B1— B2)’ S(B1— B2)/o?, (1:5) 

where S = X’'X is a p x p matrix of rank p and I is the n x n identity matrix. 

Suppose H, imposes no restriction on B and H, provides some specific hypothetical value 
B = B2. We estimate J(1, 2) by replacing the parameters by the best unbiased estimates 
appropriate to the hypotheses. That this procedure is based on the principle of discriminating 
between a null hypothesis and the alternative hypothesis, oy using that distribution corre- 
sponding to the alternative hypothesis which for the sample values provides the least 
information for discrimination against the null hypothesis—is shown and discussed more 
generally by one of us (Kullback, 1956). The classical least-squares procedure of minimizing 


z'z = (y'—B’X’) (y— XB) 
leads to the normal equations sé = X’y (1-6) 


whose solutions are minimum variance unbiased estimates (cf. Kempthorne, 1952; Kolod- 
ziejezyk, 1935; Plackett, 1949), so that 


J 1,2) = (p1— G2)’ S(pi— G2)/62, (1-7) 
where (n—p) 6? = 22 = (y’— 8’1X’) (y —X@1) =yy- 61 S61. 


In particular, for the common null hypothesis, H,: 8 = B2 = 0, (1-7) becomes 
F(1, 2) = G1 spy62, (1-8) 





ume 


rari- 


1 by 


(1-5) 


alue 
ates 
iting 
orre- 
least 
more 
izing 


(1-8) 





S. KuLL~Back AND H. M. RosEnsBuatTr 69 


It is a known result in regression theory that the components of @! as linear functions of 
the z’s are normally distributed with covariance matrix given by o?S-! (cf. Kempthorne, 
1952). 

Under the null hypothesis, that B = B2 = 0, J (1,2) in (1-8) is therefore Hotelling’s 
Generalized Student Ratio (Wilks, 1943, p. 238), or 


J(1,2) = pF, (1-9) 


where F' has the analysis of variance distribution with n, = p and n, = n—p degrees of 
freedom. These results may also be summarized in the usual analysis of variance table 
as given in Table 1-1, where 


6'1S@: = B1X’y = y’XS—X’y. 


For the more general case, corresponding to the hypothesis, H,: B = 2+ 0, (1-9) still holds 
with J(1, 2) given by (1-7). 


Table 1-1 
E _ a ae tt 
Variation due to D.F. Sum of squares 
Linear regression p BSB = oJ (1,2) 
Difference n—p yy —B1S81 = (n—p) o 
Total n yy 





2. SUB-HYPOTHESES, SINGLE CATEGORY* 


Suppose we partition the parameters B into two groups, which we will denote as B, and 
8,, so that in place of (1-4) we now consider 


z= y-(XX_) (6B), (2-1) 
2 
6 Wink on es eel os Sd 
where 8 = (3) and X = (X,,X,) = : 3. 
; enr> vate Xnq Cng+1 wees Xnp 


with X, and X, respectively of ranks g and p—q. The z’s are still assumed to be independent 
and normally distributed with means zero and common variance o*, and corresponding 


to H, and H,, 
Coe pe Ey) = sg ge, (2-2) 
L(y) = X,B,2+ X2B.2. 
It also follows that S=X’X = _ , (2-3) 
21 22 
where S,, = X,X,, S,. = X,X,=S8),, Sy. = X,X,. 


* For an alternative treatment see Grundy (1951). We summarize the results here as a preliminary 
aid for the discussion in $4. 








70 Analysis of multiple regression in k categories 


For this case, (1-5) becomes 


S S B,1—B,2 
J(1, 2 yy ‘3, 1_R’2 11 " ( 2 1 ) > 2-4 
) = (eri, Biri) (gu gt) (BN Be (2-4) 
The normal equations of (1-6) under H, become 
Si, _ (5) = 
A = ,; 9 2-5 
(so si) (68) = (xi) sain 
or Si; Bs + Si. Bb.» aa oan (2:6) 
S21 8:2 + S,. 8,1 = X 
and (n—p) 6 = y’y — (Bi, By) “an =e ). 
Sx Bot 
Let Soo.1 = Soo—SoSig*Sie, Xo.1 = ¥,—S_, SGX}, 
then (2-6) yields 6,1 = Sx, X51, (2-7) 
6. = Si? Xjy-S7'S,.6,). (2-8) 


It is useful to note (see, for example, Frazer, Duncan & Collar, 1938, para. 4-9) that 


S-3 “ ~a sa ie M 
Sx Soo M’ Sai ; 

where the q x (p—q) matrix 
M = —S7'S8,.833', = — $12. S128", 


so that in the applications, the elements of the matrix S;!, or S', are already obtained 
once the matrix S~ is obtained. 
2 
Suppose now that in particular we want to test the null hypothesis H,: B = B2 = ¢ ) 
that is, 8,2 = 0, with no restrictions on 8,2, while under the alternative hypothesis 
1 
H,: 8 = Bi = (3) with no restrictions on the parameters. Again we estimate J(1,2) by 
2 
replacing the parameters by the best unbiased estimates appropriate to the hypotheses. 
Under H, we have the previous results for 8,1, 8,1 and 6. Under H, the normal equations 


of (1-6) now yield “ ; 
B,2 = Si Xiy. (2-9) 


From (2-4), (2-8) and (2-9) we have that 


oe A, -14.2/Sun Sis S;'S a, A 
#F(1,2) = (Bp Su8i,8.)(S" 8) (—SHSePs) _ us, Bp. 0 
21 2 
It may be readily verified that 
XSTX = Xp Soh Xo. + Xi Sq' Xi, (2-11) 
that is, 61Shi - B; Soo.1 Bot + 6i2S,, 6,2, (2-11-1) 
or B°X'y = BAX; y+ BX. (211-2) 


_— 


2-4) 


2-5) 


2-6) 


2-7) 


10) 


11) 
1-1) 
1-2) 


— 


S. KuLLBACK AND H. M. RosENBLATT 


71 


The foregoing results may be summarized in the following analysis of variance (Table 
2-1). J(1, 2) = B31S.5 ,B,1/6? = (p—q) F, where F has the analysis of variance distribution 
with n, = p—qandn, = n— p degrees of freedom, under the null hypothesis H, that B,2 = 0. 




















Table 2-1 

Variation due to D.F Sum of squares 

H,: B’2 = (B72, 0’) q m 828.8, om 

Diff. p-q B21Sy9.1 By! = 07J(1,2) 

H,: 8’ = (Bi, Bi) P B1sp. 

Diff. | n—p yy —P4SB1 = (n—p) o 
se Oe a 
| Total | n yy 











3. ANALYSIS OF REGRESSION—ONE-WAY CLASSIFICATION, k CATEGORIES* 
Suppose we have k categories each with n; observations on (y,2,,...,%,) for which the 


general linear regression for each category is 

Ze = Yi — (By Xj t --- + BjrXjir t+ --- + Bip Xjip)s (3-1) 
where j = 1,2,...,k categories, i = 1, 2,...,n; observations for category j, r = 1,2,...,p 
independent variables (p<n,;), the z;; are independent, normally distributed with zero 


means and common variance 0”, and the 2;,, are known. 
The linear regressions for each category can be written as 
Zz; = y;—X;B,;, (3-2) 
where for j = 1, 2,...,k 
2; = (251) 22, -++9Zjnj)s Yj = (Yjr> Yj +++ Yinj)> 
X; = (Xj, Xj, sees Xjp)> X jr = (25175 Vijors sees Xin ip) 
and B; = (Bj, Bye: ym 
We may write the k sets of regression equations (3-2) for k categories combined as 
z= y—X, (3-3) 
X, 0 
. xX, , , , , 
by defining X= “. » B’ = (Bi, B3, ---, Bz), 
0 X,, 
Z’ = (Z4, Zags «+, Bp)s y’ = (Yi, Ya: --+> ¥a)- 


By the preceding definitions we consider B in (3-3) as a parameter matrix of all kp regression 
coefficients £;, whether or not any of them are equal, or have a particular value including 
zero, according to any hypothesis. 

* For the case p= 1, see Kendall (1946) and Welch (1935). 











72 Analysis of multiple regression in k categories 


Suppose, however, we specizy a null hypothesis with regard to certain groups or sets of 
the kp parameters /;, among the k categories, and wish to estimate the parameters and test 
the hypothesis against some alternative. To distinguish between matrices or parameter 
vectors under various hypotheses H, (a = 1, 2,...) we will use, where desirable for clarity 
or emphasis, the notation X*, B* and S* = X’«X«. Where this notation is not used the 
applicable hypothesis and definition of the matrices will be clear from the context. For any 
hypothesis, H,, we will represent the linear regressions for the k categories combined 
under H, as z= y—Xeps, (3-4) 
where z and y are specified as in (3-3); however, we now define B« as the matrix of distinct 
regression coefficients specified by hypothesis H,, and X+, the matrix of x;,, with distinct 
regression effects, specified according to the regression model defined by the hypothesis 
for the k categories combined. 

With the representation (3-4) of the k category regression under H,, the normal equations 
(1-6) become Se @: = X’ey (3-5) 
and B- = S--1X’ey, 


where the elements of S« = X’«X- will, of course, depend on the particular specification of 
the matrix X-. 


Also, equivalent to (1-7) we have, for a null hypothesis H, and an alternative H,, 


F(1, 2) = (G1 — Bey S(B— 62/62 = (B1S1G1— B'2S2B2)/62, (3-6) 
h N —pk)6? = y'y-61S16, 
where (N — pk) o? = y'y—B2S18 } (3-7) 
N = 1, +%o+... +, 


and S = X’X = S! for X defined in (3-3). Thus, for any particular hypothesis on the sets 
of regression coefficients in k category regression, the estimates for the coefficients and the 
test of the hypothesis are readily obtained solely by proper specification of the matrices 
X= and B+ in (3-4). 

Consider the two hypotheses, 


Ay: By =By (fj =1,2,...,k; r =1,2,...,p), (3-8) 
i.e. the f;, are different for all categories and for each r = 1, 2, ..., 9; and the null hypothesis 

H,: By =P, (f=1,2,...,k; r= 1,2,...,p), (3-9) 
or equivalently, B; = 8B. =(8.18 .2--P.p) (9 = 1,2,..-,4), 


i.e. the regression coefficients are the same for the different categories for each r = 1, 2, ..., p. 
Corresponding to H, in (3-8) the best unbiased estimate of B is derived from (3-5), where, 
for the hypothesis H,, B* and X- in (3-4), defining the k category regression model, are the 


same as B and X in (3-3), or A > 
8 (33), S; 0\ /B, (XY, 
is (: = f (3-10) 
0 S;/ \Bx XiYx 


that is, k sets of normal equations 


S,8;=Xjy,; (j=1,2,...,2), (3-11) 
from which 8, = S71 Xjy;,. 





Ca 


fi 


11) 





S. KuLuBack AND H. M. RosEnBLatTtT 73 


Corresponding to H, in (3-9), however, the matrices X2 and B2 of (3-4), defining the k 
category regression model, are 


X’2 = (Xj, XM, .... Mp), B= (2 12.2 -- Bp) 


k k 
Therefore, S2 = X2X2= » XjX; = YS,, 


j=1 j=1 


Ky = 3 XjY; 
j=1 
and the best unbiased estimate of 8 under H, is derived from (3-5) as 
G2 = S2-1X’zy. (3-12) 
We also have, under H,, corresponding to (3-7) 


A A 


k 
(N — pk) 6? = y'y—B1S1B1 = & ( (yj¥;- 8;S;8;). (3°13) 
j= 
Corresponding to (3-6) we therefore have 
62 (1,2) = B1Sif1— B'2Sefe = 3 6;S,6,— 62826 2. (3-14) 
ja 


This latter result is a direct generalization of Kendall’s S, (1946, § 24- 30). 

We may therefore summarize in the analysis of variance Table 3-1. J (1,2) = p(k-1)F 
where F has the analysis of variance distribution with p(k—1) and (N —pk) degrees of 
freedom when the null hypothesis H, of (3-9) is true. 


Table 3-1 

i 
Variance due to D.F. Sum of squares | 
H;: Be 7 B.? - A A | dy Sep ia TN | 
Diff. p(k—1) @’1S1 81 — B’2S2B2 = 627 (1, 2) | 
toe a ats 
Hy: 1 = Bt ph 818181 = & 68,8, | 

j=1 
Diff N—pk y’y —81S181 = (N— pk) 6? | 
ba ee ee 2 - ee? Se. eee eee a 
Total N=x&n yy | 





4, SUB-HYPOTHESIS—ONE-WAY CLASSIFICATION, k CATEGORIES 


4:1. T'wo-partition sub-hypothesis. Let us partition the parameters of the matrix B,, 
for each category j = 1, 2, ...,%, into two parts 


Bir = (Birr Bjg) amd Bye = (Bjgy1> +++ Bip) 
of q and p—q parameters respectively, q <p, so that Bj = (Bj,, js). 








74 Analysis of multiple regression in k categories 


Consider a sub-hypothesis, H,, which states that for j = 1,2,...,k, the £;, are different 
forr = 1, 2,...,q, while for r = ¢+1,q+2,...,p, there is a common value £ , for the /;,; thus 
Ay: By =By (j =1,2,...,k; r = 1.2,....q), 
By =B» (G=1,2,...,4; r= q4+1,9¢+2, ‘key 
or equivalently Ay: By = By = (Bj, ---s Big) 
Bis = Ble = (2 git +++) 2p) 


Let H, remain as in (3-8), that is, the £;, are different for all j and r. Then under H, we have 
the same matrix definitions and results as in §3. However, for H, as given in (4:1), the 
matrices X2 and 82 for the k category regression model are 


B’2 = (Bi,B3), X2 = (X,, XQ), 
where 8; = (Bi. Bs, oes Bi.1), os ” (B 41> P .o+2s a ad B's 


Xu 0 Xie 
and X, = ‘ ? X, a : ? 
0 Xia Xx2 


Xj = (Xj, Xj, sees Xjq)> Xjo = (Xjq41> Xjq+2) weey Xjp), 


(4-1) 


Xjp = (jays Ljoys +++) Xjngp) (j = 1,2,...,4; r = 1,2,..., 9). 


xX; S, S 
Thus under H, S2 = X’2X2 = ( ) X,, X,) = ( ‘a ), 
2 xX, ( 1 2) ‘Ss, Soo 
Sin 0 Sirs 
Su = a » Se=|: » Soo = (Syoo +... + Syo9), 
0 Siu Siz 
and where Sin = Xj Xj, Sie => Xj, Xjo > Shor S joe = XjoX jo. 


From the normal equations (3-5) we now obtain 


S,,8,+S8,..6. = Xy, 
uP 12P2 - } (4-2) 
S21 Bi + SooB. = Xiy, 
so that as in §2 we have 6, = S51, X .y, (4-3) 


kh k 
where Soo.1 = (Sgg—S_,85S,.) = Y (Sjo2— S jo: Sit $512) = 2 Sp2.1 


F k 
X31 = (Xjo.n---» Kien) and XX} iy = & Xie. ¥s- 
Also as in (2-8), and from the definition of matrices under H, we have 


Sai 0 po 0 Yi Sis a 
( *. > |- : B. 2 
0 Sai 0 Xia/ \y;, Siz 


SitXiyi:\ /SitSi28.. 
: a“ : ’ (4-4) 
Sai XinyYx Sia Sx128 2 


—_—_— 

>... DBD 

> = 

a _ 

—— 
II 


) 


ent 
hus 


t-1) 


uve 
the 


L-2) 


1-3) 


1-4) 


S. KutiBack anp H. M. RosEnBLAttT 75 
Thus under H, of (4-1), we have the following estimates of the regression coefficients: 
Bi. = Sil(Xjy;—Sj28 2) = 6.2 (j = 1,2,...,k), 


x k -1 k oN (#65) 
B..= (2 Syn.) % Xj2.1¥5 = B.2?, 
os j= 


where Bj = (Bji2, B's?) = 62. 


If under H, of (3-8) we also define Bj = (;,, Bj.) but rearrange and partition the sub-matrices 
of B and X so that 
B’ = (B;,B3), X = (X,, X,), 


where Bi = (Bir, Ba, ---» Bia), Bh = (Bix, Bjas «++» Big)» 
B: = (Biz, Boe. oeeg Biz), Bie = (Bigs Big+2> -++>Bjp)s 


Xu, 0 Xi. 0 
X, ~ m ’ X, = 7 9 
0 Xe 0 Xo } 


Xj = (Xj, Xj; see Xjq)> Xjo = (Xjq41 Xjq+25 sees Xj) (j = 1, 2, eeey k), 


Sin 0 Sie 0 
then Xj,X,=S,, = s, , XX,=S,,= me, = S;,, 
0 Sin 0 Sie 
Sivo 0 
X,X, = Sy. = | * ) XX =S= os “s 
0 hs 21 «(ee 


We then obtain the same estimate of B;(j = 1, 2,..., 4), as in §3, by the procedure of § 2. 
that is, from (3-5) we have 


Su B, + Si. B a pods (4:6) 
Soi Bi + SooB. = Xgy, 
6, = Sq\(Xiy—S,, 6.), (4:7) 
6, = S31,X5.1y, (4-8) 
[S21 0 
where Soo.1 = (Ser — S21 Sii* Sy.) * 7 , 
\ 0 Sy22.1 


Sj20.1 = (Sjo2— Sjo1 St Sj12) (J = 1,2, woey ke), 
Xie. 0 
X21 = X,—S.,S8q'Xj = *. ;' 
0 Xio.1 
Xjo.1 = Xjo—Sjo SaiX, (j = 1,2,...,4), 








76 Analysis of multiple regression in k categories 


so that from (4-7) we obtain under H, for each category j = 1, 2,...,k, 


6, = Sjii(Xjry;— Sirs 6.) = Bj). (4-9) 
Bj = Sjob 1 Xjo.1¥; = Byel, (4°10) 
and B; = (Bj. Bie) = Bi. 


With these estimates of the parameters under H, of (3-8) and H, of (4-1) and noting after 
some reduction that 


B’« S*B« = y’X,S;'Xiy+ 6, Soo. 17 6. (x = 1, 2), (4-11) 
we obtain 


Ao , " A A, A k A, A A, A 
OI (1,2) = Bot Soo 11 Bo! — By? Soo? Bo? = x Bio! Sjo0.1B jo! — Bo? Soo 12B.2, (4°12) 
j= 


where for computational convenience we may write 


- A A k A, , © 
B31Soo_11Bo! = By Xy ty = > Bjo! X21, (4°13) 
ja 
Bi2S.» 28,2 = Bi2X;, ,2y. (4-14) 


We may therefore summarize in the analysis of variance Table 4-1, where 


(1,2) = (p—g(k- VF, 
and F has the analysis of variance distribution with (p—q)(k—1) and N — pk degrees of 
freedom when the null hypothesis H, of (4-1) is true. 


Table 4:1 





Variation | 
due to D.F. Sum of squares 
| ed A A ay a ae | 

H,: By,?, B. 2? qk+p—-q ‘ B’2S2B2 | 

‘ } | a, A A, an a A, A 1 
Diff. | (p—q) (k—1) | B al i i atlat i 52" — Ba? Soo. 2B a? = O° J (1,2 | 

Sos 
| | Pee See ey oe 
Ay: By, Byjo pk B1Si81 ae as 
Diff. | N-pk (N — pk) 6 = y’y —B1S181 
| Total N= ny t+Ngt... $n yy 








4:2. Three-partition sub-hypothesis. If the sub-hypothesis requires partitioning the 
matrices X and B into three submatrices, so that 


8’ ~ (8i,8%,8}) and X = (X,,X,,X,), 
we obtain from SB = X’y the solutions 
é, = S35! 12X32, 
2 = St, (X3.1Y— Sys. 185), , (4-15) 
1 = Si'(Xiy — S428. — S165), 





the 


S. KuLLBACcK AND H. M. Rosensiatr 717 


Su Si Sis 
where S= [Sy Sy». S.3}, S,,= XX, (tu = 1,2,3), 
Ss, Ss. Sy 
and S33.12 = S33.1— Sso_1Sa3'1 Sos. 1, 
S33.1 = S33— S3,,Sn' 8,3, Sso.1 = S32—S 3, Six'S,. = S331, 
Soo.1 = So2—So Si’ Syo, X3.1Y = (X,—S.,Sy'X))y, 


X3.12¥ = (X3.1—Sy2.1 Seti Xo.a)¥, X3.iy = (X3—Sy, Sn" X)y. 
Also (compare with (4-11)) 
BSB = y'X, Si" Xiy+y’ Xs, Sati X2.1¥ + PsS55.128s- (4-16) 
Using (4-15) and collecting terms we obtain other useful forms of equation (4-16), e.g. 
B’SB = y’'X,Si'Xiy+B;Xs y+ GsXs..y, (416-1) 


which is convenient when the data are raw observations and x;,, = 1 for all j and 7, so that 
the first partition serves to include the 2;;, and #;, and obtain deviations about average 
values for basically a two-partition problem, and 


BSB = B.Xiy+ BP, Xsy+ Bs X3y (4:16-2) 
for a three-partition problem where the variables already are deviations about average 


values. 


The above discussion, if required, can readily be extended by induction to any number 
of partitions. 


4-3. Carter’s regression case. Carter (1949) considers the case of a correlation effect 
among the ith observations (i = 1, 2,...,”) in each of k samples. His regression model can 
be written as 


qd 
24 = Yui — D Birkin — % (4:17) 
r= 


where the correlation effect among samples is due to «;, an element common to the ith 
observation in each sample, j = 1, 2,...,k. 
It can be seen that this model is a particular case of the sub-hypothesis analysis where 
the X and 8 matrices are , yo 
4 g’ = (8,8), X= (X,Xy), 


and the submatrices are 


Bi = (BiBar.---Bir)s Bir = (Bj Aja: ++ Big) Ba = (Ors Me, «++5 Ly), 


Xu 0 Ct 7, 
X, ‘. » Xa=(: : |, 
0 Xia » Lint ss Xing 


) 


X, = 








78 Analysis of multiple regression in k categories 


where X, is a k x 1 matrix of submatrices I, the identity matrix of order n x n. With these 
definitions of X and 8, Carter’s normal equations (his (3-3)) for estimating the /’s follow 
directly from the normal equations (4-2) by obtaining 


S128 = X12, (4°18) 
where Si1.2 = §1:— $1283! Sa- 
For this case we obtain 
1 , 1 , 1 , 
(1-; Xi Xu — 7 Mi Xe tee — 7 XX 
1 , 1) ’ 1 , 
S el 7. Pactbactl (: “ i) Xs, Xs, ges Ge pon Xi 
ies ? 
B awe A we RD aw 
- aa Xu = Xj Xr (1 = i) Xj Xa 
and 
i. i He 
(: -4) Xi — 7, Xu —7%u 
es ea. — 
xX! 3 al — 7 Xu (: -;) Xs — 7, Xu 
ee 
| ae Bee 1 : 
~ Eom 7: pom (1 si i) Xia 
As before, 


Sin = XX, Sy = X,X,= Sy, Sop = XX). 


The estimates of the correlation effects «; are not given specifically by Carter. The solution 


&=2, (i =1,2,...,n), (4-19) 
_ ] a 1 & aa 
where a= ke mH “. Xu (vsi- 3 Bas) 
follows directly from a , A “ 
y SooB, = Xiy — SB). (4:20) 


5. EXAMPLE 


As an example of the foregoing results, consider the performance data of a manufactured 
product tested under three environmental conditions (categories) each involving three 
independent variables. In the equation 


25 = Vii — By Xj —Bj2% 512 — B jg Xyig— Pj X jis, (5-1) 


the data y,; and 2;,, (r = 1, 2,3,4) are raw observations, so that «,,, = 1 for all j = 1, 2,3 
andi = 1,2,...,”;. In this example k = 3, p = 4,n, = 16,n, = 15 andn, = 16. The matrices 


er 


TH 


OW 


18) 


ion 


19) 


S. KuLLBack anp H. M. RosenBuattT 79 


S; and Xjy;(j = 1, 2,3) of the computed sums of squares and products about the origin 


are 

16-0 286-8 
iit 286-8  5,340-4 
ssi 139-0 2,452-2 
4,835:0 86,849-0 
15-0 244-6 
ey 244-6 4,181-6 
_ 236-0 3,869-0 
4,625:0 75,318-0 
16-0 256-0 
s 256-0 4,221-7 
ii 97-0  1,619-2 
2,995-0 47,897-0 

where 


139-0 
2,452-2 
1,307-0 

41,990-0 


236-0 
3,869-0 
3,824-0 

72,500-0 


97-0 
1,619-2 
785-0 
17,840-0 


4,835-0 
86,849-0 
41,990-0 J’ 

1,465,575-0 


4,625-0 
75,318-0 
72,500-0 } ’ 

1,427,425-0 


2,995-0 
47,897-0 
17,840-0 }’ 

580,475-0 


Xiyi = 


X2Y2= 


X3Y3 = 


97,500 

1,788,052 
838,010 |’ 

29,484,809 


83,470 
1,404,814 
1,320,100 J’ 

25,727,050 


89,280 
1,456,596 
554,650 |’ 
16,743,450) 


nj 


nj 
Sj = (8jn)>  85n = 2 yin Lj (7,6=1,2,3,4) and Xjy; = (8jy2,),  Sjyz, = 2% Ypi% sie 
It may be noted in the above that the element s,,, = 1, 89,1 = N, and 83,, = ng. The multiple 
regression equation for all three categories combined is given by (3-4), where it will be 
remembered that specification of the matrices X« and B« depends on the model prescribed 
by hypothesis. The data in the above matrices can be suitably arranged for analysis 
according to hypothesis. 

To illustrate the statistical method seven hypotheses are considered and tested. The 
hypothesis H, imposes no restriction on the /’s so that 


Hy: Bu=Br Be =By (7 = 2,3, 4). 
All other hypotheses are compared as null hypotheses against H,: 
Ay: By = By, Bir = 0 (r ms 2, 3, 4), 
Hy: By =Ph a Py =Ph» (7 = 2,3, 4), 
A;: By ” By Bir = Fi (r = 2, 3, 4), 
Hy: Ba =Py Pe = Bip = (r = 2); Bi =P» (r= 3,4), 
H,;: Bin za Bip Bip ra Bir (r= 2); Bix pa (r = 3, 4), 
A, By =P, Bir=Phy (7 = 2); Bir = Bj (1 = 3,4) 


The above statements of the various hypotheses all apply for j = 1, 2,3. In stating these 
hypotheses we have specified £;, separately, for convenience, since in this example it 
represents the constant term which depends on the mean values. Table 5-1 presents the 
complete summary of the analysis of variance data and the tests of significance of the 
various hypotheses. Table 5-2 presents the estimated regression coefficients of the various 
hypotheses. The specification of the matrices X¢ and 8 for H, and H,; is also given, following 
Table 5-2; those for the other hypotheses follow on the same lines. 





80 Analysis of multiple regression in k categories 


Table 5-1. Analysis of variance table for tests of various null hypotheses 
H, (a = 2,3, 4,5, 6,7) against an alternative hypothesis H, 





















































Variation due to | D.F. Sum of squares Ja, a) F 
Ss | ; , 
Hy: Ba =By | k= 3| 1,556,805,752 | 
Bip =0 (r=2, 3, 4) | | | 
Be ie ee | | 
Diff. H,, H, | pk—-k= 9 | 24,992,970 | 870-41 96-7** | 
Hy: Bn =B.1 i 1,553,937,500 | 
Bir =P. (r =2, 3, 4) g=t 8 27,027,426 
Bs p 4 1,540,964,926 | 
——|—__— | 
Diff. H,, Hs pk—-p 8 833,796 | 29-04 3-63** 
Hy: Ba =Bn k 8 1,556,805,752 
Bir = By (r = 2, 3, 4) p-l 8 24,325,402 | 
B: pt+k—-1 6 1,581,131,154 
Diff. H,, H, (p—l)(k-1) 6 667,568 | 23-25 3-88**| 
H;: Bn =Bn k 8| yX,S?Xiy 1,556,805,752 | 
Bir = Bip (7 = 2) k 3] BOX .y 23,923,966 | 
=f, (r=3, 4) p-2 2| BsX3..y 493,550 | 
Bs pt+2k—-1) 8 | BsSsfs 1,581,223,268 
| Diff. Hy, H, (p—2)(k-1) 4. | B41Sif:—BsSsGs = 627(1,5) 575,454 | 20-04 5-01**! 
Hy: Ba =Bn k 38 1,556,805,752 
Bir = Bip (1 = 2) k 38 24,349,074 
=0 (r=3,4) | 
Be 2% 6 1,581,154,826 
Diff. H,, H, (p-2)k 6 643,896 | 22-42 3-74** 
H;: By =Bn k 8 1,556,805,752 | 
Bi =B.- (7 =2 1 1] 2,450,676 | 
=f, (r=3,4) | Ap—-1) 6) 22,461,479 | 
B | 2p+k—1 10 1,581,717,907 
Diff. H,, H; | p(k—2)—(k-1) 2 | 80,815 281 1-41 
Hy: Bu=Ba k 3] y’X,Sj!Xty 1,556,805,752 
| | Aoamh | 
| Bie = Bip (vr = 2, 3, 4) | pk—-k 9 | BXS.y 24,992,970 | 
| [Amer — | 
B: | pk 12! 61Sigi 1,581,798,722 | 
¥ a = | a rt “> = ra ; ae gs tc = a ae 
Diff. | N—pk 35 | y’y—@41S:@1 = (N—pk) 6? 1,004,984 | G* = 28,714 
| Total ee m+n +n, 47 | y’y 1,582,803,706 











** Significance at 0-01 probability level. 


Using the 0-01 probability level for significance, and the 0-05 probability level for caution, 
it is concluded, from Table 5-1, that 

(1) The regression is real; reject Ay. 

(2) One set of regression coefficients, including equality of means, cannot adequately 
represent all three categories; reject H. 











on, 


ely 





S. KuLLBack AnD H. M. Rosensuatr 81 


(3) One set of regression coefficients is not adequate even after allowing for differences 
in mean value for each category; reject H. 


(4) One set of regression coefficients for variables x, and x, for all three categories cannot 
be used; reject H;, further 

(5) the regression coefficients for x, and #, cannot be ignored; reject H,. 

However, 

(6) the use of one regression coefficient for the variable x, and different ones for x; and 
«, and for the constant term is adequate; accept H,. 


Table 5-2. Estimates of regression coefficients under various hypotheses 














| | 
| Hypothesis | el ae ee Bin 
| ae Gai | eal oa : <a mdb 
| | 
| Hy:j=l | 3586 203-4 — 10-69 | 3-46 
| j=2 | —7186 231-1 79-02 25-10 
| j=3 1654 227-7 —6-93 1-73 
| | 
ape pc: park Cees See GA 
H,:j= | 6094 | 
j=2 | 5564 | 
j= 5580 | 
j | 
| dhsiteies i Ie PS IE OF 
H,:j=1,2,3 | 2009 219-8 | 11-19 0-647 | 
| tal Way * Cie 77) 7 Woon <i 
| Hy: j=1 | 1803 216-0 3-71 1 | 
j=2 1589 216-0 3-71 1-2 
| Ja8 | 1862 216-0 3-71 23 | 
| 
| H;:j=1 | 1349 206-9 34-64 2-43 | 
j=2 617 224-0) 34-64 2-43 
| j=3 1625 | 205-6 34-64 2-43 
| | ‘ 
| _ aie 
H,:j=1 2467 202-3 | 
| ju 1873 2264 | 
j=3 | 2001 223-7 
hactiieswldssindh txt deste wkd ne Sass wtdeddh se ap 
| H,:j= 3431 219-8 af —410 
j= — 6767 219-8 | 79-26 24-33 
j=3 1758 219-8 ~ 1-77 


For the hypotheses H, and H, considered in the example the matrix of parameters B, 
and the matrix of observations X are given below. It will be noted that since we are dealing 
with raw observations in the example, the regression coefficients /;, of B and the vector 
X;, of X (j = 1, 2,3) were partitioned for every hypothesis. This provides for the usual 
practice of obtaining sums of squares and products of deviations about average values to 


6 Biom. 44 








82 Analysis of multiple regression in k categories 


simplify further calculations by reducing by one the rank of matrices S (of sums of squares 
and products) whose inverses must be obtained. 


A: By - By, Bir = Bir (r = 2,3,4; 7 = 1, 2, 3), 
B’ = (8;,B:), X = (Xj, X,), 
BR; = (Pir, Por Bai); 6, = (Bis, Bie, B32), Bio = (Bj Bia» Bia), 


Xy ° 0 Xi. . 0 
X, = . Xo . > X, = . Xoo . > 
1 


0 . 0 —— 
Xj => (1, z. eee 1) Xjo = (Xj, X53, Xja)> 
order 1 x n,, Xjr = (jays Ljoys +19 Xingr)s 


As: By - By, Bir = Bir (r = 2), Bin = B, (r=3,4; 7 = 1, 2,3), 
B’ = (B},B2,B3), X = (X,, Xs, X5), 
By ~ (Pir Ber Ber); B, = (Ais, Boe, Asp); BR: = (8.3, 2.4), 


Xy ‘ 0 X10 7 0 X3 = (Xi3; X33, X33), 
X, = ° Xo1 . ) X, os ‘ Xoo ij 5 Ass = (x Xx; ) 
0 ~~ 0 - Xe we) 


X,, and x;, (r = 2,3, 4), are defined as under H,. 

In the foregoing example each hypothesis on the parameters applied to all categories, 
j = 1,2,3. It is clear, however, that this need not be the case for the theory and method are 
equally applicable for any assertion of the hypotheses with regard to the parameters. For 
example, we might have considered a case where part of the hypothesis concerned equality 
of the parameters for certain of the categories, but not for all, e.g. 


Hs: Bi =B + (jg=1,3; r=1), 
Bir = Bir (j = 2; r= 1), 
By =B, (J =1,2,3; r= 2), 
By = By (J = 1,2,3; r = 3,4), 
and analysis by the three-partition sub-hypothesis procedure of § 4 would apply. 


The authors wish to express their sincere appreciation to the computing staff at the Naval 
Proving Ground, Dahlgren, Virginia, and to Mr Fred Okano for their able assistance in 
carrying out the computations. 


REFERENCES 


Carter, A. H. (1949). The estimation and comparison of residual regressions where there;are two or 
more related sets of observations. Biometrika, 36, 26—46. 

Frazer, R. A., Duncan, W. J. & Cotzar, A. R. (1938). Elementary Matrices. Cambridgq University 
Press. 

Grunpy, P. M. (1951). A general technique for the analysis of experiments with incorreptly treated 
plots. J. R. Statist. Soc. B, 13, 272-83. 

KEMPTHORNE, O. (1952). The Design and Analysis of Experiments. New York: John Wildy and Sons, 
Inc. 








Sa 


nH sha 


rg 


aaa 7 we 


eS 


ies, 
are 
‘or 


ity 


val 
> in 


o or 
sity 
ated 


ons, 





Se = 


S. KuLLBACK AND H. M. RosENBLATT 83 


KENDALL, M. G. (1946). The Advanced Theory of Statistics, 2. London: Chas. Griffin and Co., Ltd. 

Ko.opzirsozyk, S. (1935). On an important class of statistical hypotheses. Biometrika, 27, 161-90. 

Kutzpack, S. (1952). An application of information theory to multivariate analysis. Ann. Math. 
Statist. 23, 88-102. 

Kuxxsack, 8. (1956). An application of information theory to multivariate analysis. II. Ann. Math. 
Statist. 27, 122-46. 

KULLBACK, 8. & LErBieER, R. A. (1951). On information and sufficiency. Ann. Math. Statist. 22, 79-86. 

Pearson, E. 8. & Wixks, S. 8. (1933). Methods of statistical analysis appropriate for k samples of 
two variables. Biometrika, 25, 353-78. 

Piackett, R. L. (1949). A historical note on the method of least squares. Biometrika, 36, 458-60. 

Rao, C. R. (1952). Advanced Statistical Methods in Biometric Research. New York: John Wiley and 
Sons, Inc. 

RosENBLATT, H. M. (1953). On a k sample multivariate regression problem. Master’s Thesis, The 
George Washington University. 

WE cH, B. L. (1935). Problems in the analysis of regression among k samples. Biometrika, 27, 145-60. 

Witks, S. S. (1943). Mathematical Statistics. Princeton University Press. 

Wituiams, E. J. (1953). Tests of significance for concurrent regression lines. Biometrika, 40, 297-305. 








[ 84 ] 


BIVARIATE STRUCTURAL RELATION 


By R. L. BROWN 


British Coal Utilization Research Association, Leatherhead, Surrey 


SUMMARY. Given n observations (z,, y;), each subject to an error of measurement ascertained 
independently, the existence of a structural relation for the true points (X, Y) is discussed. For a 
linear relation a ¢-number is defined from which either a confidence or a structural theory can be 
developed. The relation is treated in toto. Structural theory is described in terms of three hypotheses 
of regularity, which enable confidence and fiducial theories to be contrasted. It is shown that the theory 
provides a solution for the curvilinear relation wherein all the alternative hypotheses are incorporated, 
and it is suggested that the latter is a necessary requirement. The alternative hypothesis implicit in 
the ¢-number treatment of the linear relation is deduced from the general theory. 


1. THE PROBLEM 


Given n measurements (x,, y;) of two variates, each subject to error, and some theoretical basis 
for a structural relation governing the true points (X,, Y;), it is a fundamental problem in the 
physical and engineering sciences (Brown, 1955) to establish whether or not the structural 
relation is in accord with the evidence. One simple form of this problem is considered in the 
present paper; it is supposed that the conduct of the experiment has been judged reliable 
in the context of existing knowledge and that the data are so organized that the distribution 
of errors can be ascertained and adopted as a quantitative measure of reliability. The 
problem thus formulated was suggested by an experiment on coal breakage (Brown, 1947). 

Previous work has been limited mainly to the linear relation and the main objective 
has been to estimate a best relation. When the method of maximum likelihood is used it is 
found that the best relation passes through the mean of the observations; the limits obtained 
for the coefficients defining the relation are then conditional on this requirement. More- 
over, it is usual to find limits for the coefficients separately; this procedure destroys the 
structurality of the relation; it is essential to treat the relation in toto. Attempts to find 
relations amongst variates in the absence of independent information on the errors show 
that it is then necessary to assume that the relation exists, leading to doubtful inverted 
arguments; Moran (1956) has recently shown that the relation is then unidentifiable. It is 
well known that in the linear case the ratio of the error variances suffices for the estimation 
of a best relation; here the alternative hypotheses are not obvious and it is even doubtful 
if they can be formulated. 


2. LINEAR RELATION 
2-1. d-number 


If the data are normalized and the error* in x is distributed N(0, 1) and that in y as N(0, 1), 
then the perpendicular from an observation (x, y) on to a line 
Y=a,+a,X (2:1) 
is distributed N(0,1). Since, as Lindley (1947) has pointed out, it is unnecessary to say 
anything about the true points (X, Y), part of the difficulty of the problem is removed. 
Putting nb = (—Y; + &%+a,2%;)? (2-2) 
+ Ge 


* This conventional formulation is discussed further in § 3-1. 





te 





————ee 


R. L. Brown 85 
we know that n¢@ has a x?-distribution #(¢) dé. Then the numerical* statement, 


dp 
Pr(p<¢,) = }  2($)d9 = p, (2-3) 


associates a probability p with a chosen ¢,,. Notice that the differential element d¢ is mixed, 
depending on both the observations and the coefficients of the line. Writing nS, nS,,, ete., 
for the sum of squares and products of the observations and taking the origin at the mean, 


we have 
2 ¢ 2 
— Syy +. = 20 Sry + HS cn 


2-4 
p 1+? (24) 
A confidence region for the coefficients (a, «,) jointly can be written down immediately 


from equation (2-4). It is the non-central conic, 


23(Sir— Gy) — 24 Spy +08 = ($y —Syy)- (2:5) 


The coefficients (%9, «,) of all lines for which ¢ < ¢,, give points lying inside the conic, leading 
to the usual assertion that the coefficients lie in this region in a proportion p of cases in the 
long run. The region is bounded if the conic is an ellipse.+ 


2-2. Acceptance conic 


Confidence theory is not develeped further in this study of structural relations since it 
does not correspond to the habit of thought of a physicist who pictures the problem as 
ill-defined graphs in the space of true points (X, Y). He would regard the coefficients 
(%),%,), even if predicted from a valid mathematical theory (Jeffreys, 1948, pp. 283-5), 
as uncertain and would wish to translate the ¢-number (2-4) in terms of the probability 
statement (2-3) into a family of lines acceptable at probability level p (cf. §3-1). Such a 
transformation of the probability distribution into true point space is easily effected by 
finding the envelope of lines (2-1) subject to the constraint (2-4) on the coefficients. This 


gives the acceptance conic, 
(Y-a,X)? (a,Y+X)? 


a _ - = (1+a?), (2-6) 
a- ee OT 
1 7 ‘ ’ 6 
where cy = 2S [ose — Sx) 1 {((Syy <7 ae 7 482 }4], (2-7) 
ry 
a) Bay Y 2.8 
oy = Bun Qe ? ps = Sp. +, Syy, (2 c ) 


1 


and it is assumed that a, >0, ¢,<¢,. Equation (2-6) does not contain the unknown coefti- 
cients (a>, %,). 

The acceptance conic could be regarded as a transformation into true point space of the 
confidence conic (2-5), and, as such, it could be given a confidence foundation. This view is 
supported by the disappearance of the unknown coefficients (%),«,) in the process of 
obtaining the envelope. But the envelope is also to be regarded as the locus of all lines in 
true point space having ¢ = ¢,,, and t’:e locus is determined for the particular observations 
(x;, y;). Thus an interpretation analogous to fiducial theory appears feasible. In the author’s 

* In the sense of indifference to content as opposed to numerical measure; cf. the third regularity 
of § 3-1. 

+ The limits are the same as those found in § 2-2. 








86 Bivariate structural relation 


view, however, the fiducial method is not applicable since structural coefficients differ 
in kind from population parameters; in § 3-1 it is suggested that the acceptance conic should 
be interpreted in terms of new hypotheses of regularity. Meanwhile, it may be noted that, 
on account of the existence of a ¢-number, the alternative interpretations yield the same 
limits for (a, ,). It appears worth while to show that the acceptance conic has satisfactory 
properties. These are: 

(i) the relation has been treated in toto, 

(ii) the central conics for ¢,, differing are confocal, 

(iii) for 6,<¢, <¢z, the conic is an hyperbola and ail lines having ¢ < ¢, lie inside the 
hyperbola, which therefore bounds all acceptable relations, 

(iv) for ¢,>¢,, the conic is imaginary and it follows that there is no acceptable linear 
relation, 

(v) for ¢,, >, the conic is an ellipse and every line outside the ellipse has ¢ > ¢,,; here 
it might be said that the line is indeterminate in direction, since the range of the data 
were not large enough in comparison with the errors; the ellipse is then the limiting boundary 
of the mean of the observations. 

When the acceptance conic is an hyperbola, the limits for «, are given by the asymptotes 
limits «, = a V(2— Pp) + V(Gp— 1) ; (2-9) 
V(¢2— py) 3 a, (bp — 91) 
and for the intercept J at X = h, 


kI = a,h($y— $4) + [(1 +42) ($2—$,y) (bp — G1) (+471 +03)]}, (2-10) 


where k = (¢.—$,) —a3($, — 9)). 


These intercepts do not belong to parallel lines. Notice that they depend on one probability 
level p. The axes of the hyperbola are 


Y—-a,X =0, a,¥+X=0, (2-11) 


and these can be called respectively the ‘best’ and ‘worst’ lines through the mean of the 
observations. But the ‘best’ line is not here estimated in the usual sense. 
Next, if r is the correlation coefficient of the observations, 


it being intuitive that ¢,<S,,,S,,,. This might be regarded as a test for a unit population 


correlation (g = 1). In the example of § 2-3 below, r? = 0-94 and (: - £2) (: - £2) = 0-90 
rz. yy 
at p = 0-95. Of course, (2-12) makes use of the errors of measurement and provides a 


test different from that derived from the bivariate normal distribution, wherein p cannot 
be unity. 
Lastly, the method may be shown to be consistent. Thus the pairs of sums 


S8,, Sx; 8, (a Sxy+a%); S,., (Sxx+1); Soy a Sxx; 8,5 (a3Sxx+a§+1), 


P 


converge in probability to the same values, provided the errors are not correlated to the 
observations. Hence the pairs a,,«,; $,,1; $2, Sxx(1+%), converge in probability to the 
same values. Also the hyperbola tends in the limit to the true line Y = a, X repeated. 


co 
pu 
re 
of 
in 


W 


re 





he 


ar 


es 


he 
he 


R. L. Brown 87 


The hyperbola is defined by the five statistics 7, 7, a,, $;, ¢2, and these are unbiased and 
consistent in probability. These statistics give a sufficient expression of the data for the 
purpose of providing, through the ¢-number, a test for the existence of a linear structural 
relation. Nevertheless, I consider that n¢ has a y2-distribution. This question of degrees 
of freedom is discussed more fully in the extension of the problem to acceptance quadrics 
in m dimensions which it is hoped to publish in due course. 


2-3. Example 
The following data were prepared with the aid of a table of random normal deviates, the 
errors in x and in y being taken N(0, 1): 
x 18 41 58 75 93 106 13:4 14:7 18-9 
y 69 12:5 20:0 15°7 249 23-4 30-2 35-6 39-1 


We find n= 9 €= 957 Y= 23-14 
NSp = 238 nS,,=451 nS, = 906 
nd,= 11 ny = 465 a,= 1-98 


Now x5 at p = 0-95 gives nd, = 20. Obviously there are acceptable straight lines, the 


acceptance hyperbola being 
(Y —2X)?—0-02(2Y + X)? = 5, 


referred to the mean as origin. The limits for the slope are 
a, = 1:5 to 2-7, 
and for a, (at h = 0) Oy = + 2:2. 


The true line (Y —¥) = 2(X —%)—1, from which the data were derived, does not meet the 
acceptance hyperbola. (See Fig. 1.) 


2-4. Regression case (7, = 0, 0, = 1) 


Here nh = Yi) (—Yzp+%q+o4%;)" (2-13) 
i 
has a x?-distribution. Putting 
'g Sra8;y—Sty 1 aie | 4 
a, = Sv nw = —_ = 5. at x; y; 1 | : (2-14) 
fe rx rrv<) x; Yj; l | 
(Y—a,X)? X? _ 





the acceptance conic is (2-15) 


gain ae I, 
(dy om w) Bie 
which may be interpreted as before. Notice that in w the type term is zero if the pair of 


points (x;,¥;), (x;,y;) lie on a line through the mean. The ‘best’ line is the regression line 
Y—a,X = 0. The limits for «, are 


$p—w\ (2:16) 


limits a, = a +{ 3 
2 a 


and for the intercept J at X = h, 
2\)4 
1 = ahi {(,-w)(1+5-)| ° (2°17) 











88 Bivariate structural relation 


Increasing S,,, narrows the range of both these limits, an outcome that corresponds to a 
usual expectation of a physicist. 

When the error in the y-direction is based on a limited number of degrees of freedom, the 
same result holds with Snedecor’s F-distribution replacing the y?-distribution of ¢. 


45 T T qT 





T 
True 
line 











25 





Fig. 1. Example of acceptance hyperbola derived from a ¢-number. 


2-5. Further development 
Consider now a case when a ¢-number having a simple probability distribution cannot 
be found. This arises when estimates, each on m degrees of freedom, are available for the 
error variances. Suppose the data to have been studentized. Then if p, is the perpendicular 
from the ith observation on to the line (2-1), 
p; = dx; sin 0 — dy; cos 8, (2-18) 


where dx;, dy; are the errors of the ith observation. Hence assuming there is no bias in the 
measurements, p; is distributed N(0, 0% sin? @ +02 cos?@), and thence n@ in (2-2) is dis- 
tributed y%,(o7 sin? 6 + 02 cos*@). 

Now if dx, dy, are the errors in a, y from which the estimates s2 = s? = 1 were derived, 


p, = da sin 6 — dy), cos 0 (2-19) 








to a 


, the 





R. L. Brown 89 
is normally distributed, and therefore S p.” is distributed as chi-squared. But 
1 
2(3 ve) = m(o%. sin? 0 + o? cos? 6), (2-20) 
1 


provided E(dx,dy;) = 0, (2-21) 


which is usually the case. Since the observations (;, y;) are studentized, 


m m 
¥ ba? = J dy? = m. (2-22) 
1 1 
m 
Put d bap dy), = mh. (2-23) 
1 
m 
Then d (dz) sin 6 — dy, cos 0)? = m(1— 2hsin 6 cos 4) (2-24) 
1 


has a X7,(0% sin? 0 + o? cos? @) distribution. Thus 





a 9.9 
f= : 2ha, an 
1+az 


has an F'(n, m) distribution. 

This result is of little value since the quantity h would not usually be known. h has no 
particular physical significance under usual conditions of experimentation. 

Further complications arise when, (a) it is not assumed that the measurements are free 
from bias, (b) the degrees of freedom with which s?, s? are determined are not equal. 

In such cases, the methods proposed here, coupled with approximate distribution theory, 
would appear to lead to approximate tests. Perhaps the lack of an exact statistical test is 
not unexpected, since the organization of additional information in the form of errors of 
measurement requires a combination of scientific and probabilistic languages that merits 
deep consideration.* 


3. CURVILINEAR RELATIONS 
3-1. The three hypotheses of regularity 


The simplicity of the linear problem is due to the existence of a d-number, leading to a con- 
fidence theory or to an acceptance conic. But any satisfactory approach to structural 
relations, as they are understood by physicists and engineers, must be capable of leading to 
curvilinear systems. A more general treatment is necessary even in the case of the linear 
relation, for it will have been noted that the alternative hypothesis of a structural, but 
non-linear, relation is not encompassed in the derivation from the ¢-number. A new theory 
will now be given for the general relation 


Y=f(X,a;) (j=0,1,...,8—1). : * (3-1) 


For a physicist, there is a pattern of relationships underlying his scientific thought. In 
other words, the disposable parameters a; are not uniquely determined; their values are 


* Kendall (1951, 1952) and others have approached tentatively this question in their discussion of 
the different types of variate that are encountered in practice. See also § 3-1. 








90 Bivariate structural relation 


uncertain.* Moreover, the functional forms f(X,«;) can translate into each other, usually, 
however, within a limited class of functions such as wave or power laws. 

The meaning of (3-1) can be clarified by enunciating three hypotheses of regularity: 

(i) Regularity in the succession of observational data. This hypothesis is implicit in con- 
fidence theory in so far as it is considered to yield results valid in the long run. Adopting 
a much simplified argument, if X is a true value, x an observation, then 


e = (X—2) (3-2) 


defines an error ¢. In confidence theory X is regarded as knowable absolutely and the rela- 
tion de = —dzx puts all the error on to the observational data. Then the probability distribu- 
tion p(e) de is regarded as established by repeated observation. 

(ii) Regularity in the totality of coexisting patterns of structural relations. If the observational 
data are regarded as known and fixed, i.e. a subset of all possible data is agreed to be chosen, 
then the relation de = dX yields what may be called a structural theory, the development 
of which has been the objective of this paper. As an example, consider the class of patterns 
of structural relations that are linear. This class is abstracted from a wider class on the 
supposition that the disturbances of linearity are many and small. Such abstraction may be 
considered valid if account be taken of the probability distribution p(e) de, found now from 
the central limit theorem and not by repeated observation. 

(iii) Regularity in the recurrent alternations between underlying patterns and observational 
data. Alternation of theory and experiment is a recognized element in scientific work. It 
reconciles the opposition of the first two regularities, taking into account that patterns in 
their totality and the observational data in all their possible ac’ualizations are both un- 
knowable, though for different reasons. Recurrent alternation is the true source of the error 
¢€ and its probability distribution, which strictly apply to neither of the first two regularities. 
If ¢ is distributed N(y, 7), then the relation de = du would appear to lead to fiducial theory ; 
the author at present considers that the fiducial theory of the location parameter yw is often 
a structural theory in the sense defined here, whereas that of the scale parameter o may 
be distinct theory, neither confidence nor structural. 

It will be noticed that the first regularity cannot in fact yield results ‘in the long run’, 
although the introduction of randomization procedures can in part overcome this difficulty. 
Thus it is not straightforward to accept it without appealing tacitly to an intuitive apprecia- 
tion of the determination of occasions in the framework of space-time, with a consequent 
limited notion of repetition. A generalized notion of repetition is needed to accommodate 
all three regularities and this requires an ampler framework, such as has been proposed, for 
quite independent reasons, by physical scientists.+ In such a framework all three regularities 
must be accepted as determining what can be known; they gain intelligibility through the 


* Jeffreys (1948, p. 60) identifies a true value with the mean of the probability distribution; but later 
(pp. 268-76), in discussing systematic errors, he distinguishes the true value from the location parameter. 
The distinction is not trivial. See also the ‘Principle of Limited Variety’ in C. D. Broad’s presidential 
address to the Aristotelian Society (1927-8). 

+ For example, Kaluza (1921), Caldirola (1942), Bennett, Brown & Thring (1949), and others have 
explored the scope offered by a five-dimensional framework for the representation of physical events. 
Bennett has emphasized especially the anti-time-like character of a fifth dimension. Podalanski 
(1950) and Bennett (1953) have proposed a six-dimensional framework. Bennett’s precise character- 
ization of the two extra dimensions gives a clear basis for the three regularities cited here. Of course, 
all three regularities determine every occasion. 





an 


y; 





R. L. BRown 91 


relativistic notion of repetition due to Bennett (1957), namely, that the moment of observa- 
tion, with its patterns of potential possibilities, is multi-valued in its actualizations. 

It follows from the second regularity that in a structural theory the relationship should 
be treated in toto and should embody the alternative relationships within the class 
abstracted from the totality of all possible patterns. 

The question then arises as to how an independent and relevant assessment of the prob- 
ability distribution can be made. This might be derived theoretically or experimentally. 
The former rests on the central limit theorem. in the latter case, consider the coal experi- 
ment that led to this investigation, where the errors were obtained from measurements of 
sub-populations clearly defined by an attribute meaningful in respect to the measurements 
x, y and the structural relation under consideration. The attribute was determined by the 
third regularity. In the absence of such an attribute it is questionable whether an adequately 
reliable experiment can be done; the adaptation necessary in Wald’s (1940) treatment, in 
which the observations (x;,y;) are divided into two sets, is not obvious and perhaps not 
appropriate. 


3-2. Structural theory 


Suppose that » measurements (x;,y;) and independent information relating to the dis- 
tributions p(e) de, g(@)d0 of the errors (€,@) associated with (2, y) respectively are given, 
these being supposed to be free from bias. When X is given in (3-1), Y can be calculated. Let 
X; (i = 1,...,”) give the location of the true points corresponding to the observations (2;, y;). 
Thence the 2n errors are 

é; = (X;—2,), 9=(f;-y): (3-3) 
where f; is evaluated at X;. 

Take s = n. Regarding the coefficients «; as lying in an n-dimensional coefficient space 
(Hp, 4, ..-,%,_,), the true coefficients may be thought of as a point. Then the errors @ are 
given by variation of points in the coefficient space, i.e. by variation of «;. 

The joint probability distribution of the 2n errors ¢,, 0; is 


p(de,, deg, ...,de,, dO, ...,d0,,) = Il p(e;) q(9;) de; d0;. (3-4) 
i=1 


Transform to 2n new variables X;, «;. The Jacobian is easily seen to be the n x n determinant 





2h; | 
; ; | Bax; || 
and the following equation 
lo , || ~ n-1 
p(dX, ...dX,,, day ...da%,_4) = [# It . P(X — 2) Ufi— Yi) dX; da; (3-5) 
[O%s || i=1 j= 








is obtained. 
This equation, which is basic for what follows, is interpreted as a probability density in 
the 2n-fold of the n true values X; and the n-dimensional coefficient space «;. The X; locate 
the true points on the curve and the a; locate the curve having the functional form given 
in (3-1). 
Suppose now that the structural relation is a simple additive function of the disposable 
coefficients «;, i.e. 


n—-1 


S(X,a;) o & %594(X), (3-6) 








92 Bivariate structural relation 


where ¢;(X) are n independent functions of X only. Then 
in n—1 s sl 
o(TLaX, TT day) = | A(X) | TL TL vXi- ad afi—y) dX day, (3-7) 
/ i=1 j= 


It may be noted that the n functions ¢,;(X) contain in themselves all the alternative hypo- 
theses, as is necessary in a structural theory. 

To proceed further from equation (3-5) or (3-7) it is necessary to introduce an hypothesis 
for the true points X;. Strictly, this hypothesis should be related to the way in which the 
experiment is carried out. Thus a physicist would endeavour so to space his data along the 
curve as to give it good definition. However, it does not appear to be convenient to introduce 
such information into a general treatment, especially bearing in mind that it is uncertain 
whether the physicist has achieved (or even, can achieve) his objective. 

So long as there are n disposable parameters ~;, the most probable positions of X; are the 
observations x; themselves. But given an hypothesis that in the coefficient space of the «’s 
there is a region such that some of the «’s are zero, then the likelihood may be maximized 
with respect to the X,, leaving a joint probability distribution for the remaining a’s. 

A third method is to suppose that each true point can be located anywhere on the curve 
and to form the n-fold integral of equation (3-5) or (3-7) over the range of possible variation 
of each of the X;. This method is adopted here. It has the merit of stressing the structural 
nature of the relation (3-1). 


3:3. T'wo observations and normally distributed errors 


When the errors are normally distributed and adjusting the scale of the observations so 
that both standard deviations are unity, 


ple)de = —+_e-¥*de, 9(6)a0 = — 


: 402g 2. 
Jam) J(en)° dé. (3-8) 


Here bias has been removed by taking zero means. Consider the linear relation 
With two observations, equation (3-7) becomes 


Pr oo fo l 


p(da ,da,) =| A 


x exp [—3(X,—2%,)?— }(X_—2%_)?-—4(—y, +%9+0,X,)? 


—4(—Yot%+a,X,)?]dX,dX,da,da,, (3-10) 
whence 
(%1(Y% — Yo) + (% —2q)| 


p(da,da,) = | Ia(1 + a)? 


* exp| — (20-2 + Yo Ly +2y 


1 8 oi 
(1+a3)| 3 +a, 5 ) +e -.— mn |dagday, (3-11) 


As might be expected, the probability distribution of «, is independent of a»), whereas the 
distribution of «) contains «,. 





ea a 


————e 


sO 


Th 


is 





» SO 


+8) 


+9) 


10) 


the 





SSS 


R. L. BRown 93 
Transform to independent variates u, v by writing tan @ for a, and 


J2u = (%_—2,) sin 0 — (y2—y,) cos 0, 


Ya + Yo 3-12 
8 = (%—4y,+ Ye) cos 04+ 4(x,+2,) sin 0. ( ) 


v/,/2 is the length of the perpendicular from the midpoint of the join of the two observations 
on to the line (3-9) and w,/2 is the length of the projection, perpendicular to the line (3-9), 
of the join of the two observations. Then w, v are each normally distributed with zero mean 
and unit variance. The modal values of wu, v correspond to the line passing through the 
observations; for this line 


Ge = “ats 79a), tan@ = (Y=). (3-13) 
%y—% | Aq— Uy 


3-4. The d-number for the linear relation 


Given 2 observations and normally distributed errors, write 


2 U, = (2p, " Loy—1) sin 0, = (Yo, Yor—1) cos 6, | 


sainvbibliaithe ay. (3-14) 
v,|-/2 ej (Xo, a 2 Yor + Yor—1) cos 0, + 3(22, 1 Xo_1) sin 6,5 | 
so that the observations (2,_ 1, Y%2,_1) (Wa, Ya) are associated with the line 
Y = &,+X tan6,. (3-15) 


Then each of w,, v, is distributed N(0, 1) and they are evidently independent. Thus 


n 
2nd = DY (uz +v7) 
r=1 


2 


— zs [{( “a Yor + Aor) cos 6, + voy sin 0,3" + {( ac Yor-1 re Xq,) cos 6, + Vor-4 sin 4,37] (3° 16) 


r=] 
is distributed as x3,,. Now for the linear relation 
dy =, 9, = 8, (3-17) 


and equation (3-15) becomes exact. Also the y3,,-distribution yields a likelihood # in ¢ or 
numerical space: hence a structural theory for the use of é-numbers in § 2. It is interesting 
to note that the particular way in which this result is derived expresses the alternative 
hypothesis in the form that one or more pairs of observations do not lie va a straight line 
through the mean. It follows that when in § 2 there is no real locus of equally probable lines, 
alternative hypotheses of the form indicated in this paragraph must be considered. More 
fundamentally, alternatives deriving from the pattern of structural relations, e.g. that there 
is a parabolic or higher degree relation, may be considered. For the latter a é-number has 
not yet been found. 

The validity of the linear hypothesis (3-17) may also be tested by examining whether 
there are some (a, 0) for which wu,, v, are normal deviates. To this end, statistics such as the 
mean, sum of squares and k,kz!, where the k’s are cumulants, might be computed and 
compared with their known distributions. Such tests are only approximate in virtue of 
equation (3-15). 











94 Bivariate structural relation 


3-5. The general case 


For normally distributed errors, the general case remains unsolved. For example a linear 
relation as a limiting case of a parabola cannot yet be obtained. The difficulties are con- 
siderable. Those to be expected in any problem of curve fitting become evident, with the 
further complication that the slope of the curve in the neighbourhood of each observation 
enters into the compounded error variance, so that the equations are non-linear. An indica- 
tion of an approximate method—specifically applied to a parabolic relation—is given in an 
appendix. It is hoped to discuss elsewhere further developments from equation (3-5) 
having in mind that the Jacobian contains in itself the whole scheme of alternative 
hypotheses that can be introduced into a discussion of any class of bivariate structural 
relationship. 


4. CONCLUSION 


Whether or not the physicist or engineer needs to use statistical methods of inference is 
perhaps beside the point. "Lis procedure consists of a series of steps. First, an endeavour is 
made to isolate certain phenomena. Secondly, an attempt is made to reduce them to 
measurements by conventional rules, these being generally accepted by scientists. Thirdly, 
the results of reliable experiments are compared with (a) general background knowledge 
and (b) present theoretical (structural) conceptions. Then a fresh experiment is designed. 
The interesting question is whether an investigation of the type undertaken here can show 
the strength and weakness of the several steps, adopted intuitively or unconsciously, by 
the practising scientist. For this reason, further investigations on the present lines appear 
to be necessary. 


The author is indebted to Mr F. Fereday, Mr 8. R. Broadbent and Mr W. D. Ray for 
several useful discussions during the course of the investigations and to Mr J. G. Bennett 
for guidance in formulating his ideas in the form of the three regularities given in § 3-1. 


REFERENCES 


BENNETT, J. G. (1953). Proc. 11th Int. Congr. Philosophy, 6, 102. 

BENNETT, J. G. (1957). The Dramatic Universe. 1. The Foundations of Natural Philosophy. London: 
Hodder and Stoughton. 

BENNETT, J. G., Brown, R. L. & Turina, M. W. (1949). Proc. Roy. Soc. A, 198, 39. 

Broap, C. D. (1927-8). The principles of problematic induction. Proc. Arist. Soc. p. 1. 

Browy, R. L. (1955). Essence of design of experiments. Appl. Mech. Rev., October. 

Browy, R. L. (1947). The brittle fracture of pre-cracked solids. Research, 1. 

CaLprIrRoia, P. (1942). Nuovo. Cim. 19, 25. 

JEFFREYS, H. (1948). Theory of Probability, 2nd ed. Oxford University Press. 

Katuza, A. (1921). S.B. preuss. Akad. Wiss. p. 968. 

KENDALL, M. G. (1951, 1952). Regression, structure and functional relationships. Biometrika, 38, 
11, and 39, 96. 

LinD.teEy, D. V. (1947). Regression lines and the linear functional relationship. Suppl. J. R. Statist. 
Soc. 9, 218. 

Moran, F. A. P. (1956). A test of significance for an unidentifiable relation. J. R. Statist. Soc. B, 
18, 61. 

PoDALANSEI, J. (1950). Proc. Roy. Soc. A, 201, 234. 

Watp, A. (1940). The fitting of straight lines if both variables are subject to error. Ann. Math. 
Statist. 11, 284. 











ant 


Ww 








R. L. BRown 95 


APPENDIX 
Approximate treatment of the curvilinear relation by relaxation of constraints 


The exact solution given for two observations and normally distributed errors can be used to treat 
approximately the general case. First we relax the constraints on the observations imposed by the 
structural relation. We have to assume that the observations are fairly evenly spaced along the curve 
and that there are no ‘repeat’ observations. Repeat observations are equivalent to a reduction in the 
degree of the curve and would require, in any theory, special treatment. We use the approximation (3-15) 
to find the intercepts k, of the curve on an equidistant grid of ordinates with abscissae (sh +1). The grid 
is chosen so that the absvissa (sh + /) is close to the observed value x,, there being 2” lines on the grid. Then 
choosing the origin so that / is zero: 





kor = Hop + (27h) tan 6,, 
Kop = gp + (27 — 1h) tan - wate 
and thence Cee = (rks —2r—1k,,), 
tan 0, = ; (Koy — Keap_y)- sisi 
In this case 
attra) ithe haa SF, Cath th a ay 


1 + (Kap — Keay) /h? 1+ (ko, — Kegy)/h? 
The constraints on k, arising from a structural relation are now easily introduced. For example, a poly- 
nomial of degree m gives ‘ai 
k, = > ash), (A4) 
i=0 


n 
there being (m+ 1) unknown parameters «;. Since 2nd, or })w?, has a x?-distribution, minimum y? 


estimators (also maximum.-likelihood estimators) may then be found for the a;. The minimum value of 
2n¢ has then an approximate ¥?2,_,~1) distribution. The degree of the polynomial necessary for repre- 
sentation of the data can thus be found. Owing to the form of equation (A 3), it is necessary in practice 
to adopt an iterative method for finding the parameters «;. 


Parabolic relation 


The approximate treatment of the preceding sub-section yields a statistical test for establishing the 
degree m of a polynomial (A 4) that adequately represents a structural relation between two variates, 
each subject tc normal errors, the alternative hypothesis being that a polynomial of degree (m+ 1) or 
greater is not necessary. Consider the parabolic relation 

Y = a@+0,X +a,X?. (A5) 


When the points on the grid (sh, k,) lie on this parabola, we have from (A 3) 





Oop = A, — Ah? 2r(2r—1), 
or o— Xe (A6) 
tan 6, = a,+a,h(4r—1). 
Then the equations determining &», a, «, for minimum 2n¢ are 
n 
> a,cos 0, = 0, 
r=1 
n 
> ¢,cos? 6, = 0, s (A7) 
r=1 
n —— 
> 2rc, cos? 6, —hr 2r—1a,cos 6, = 0, 
r=1 
where Ay = (2Olgr — Yor + Yor-1) COS O, + (ap + Xap) Sin O,, 
CC, = 3 sin 26,[2(28, + x8,_1) aw (Yor aia Yor—1)* _ (2a, a Yor—1)*] (A 8) 


+ 2 cos 20,[o7(Xap + Vor—1) — (or Yor + Lor—1 Yor—s)]- 








96 Bivariate structural relation 
Let a}, «1, «1 be an approximate solution of equation (A 7) and a,, 61, a}, ch the corresponding quantities 
from equations (A 6,8). Then expanding (A7) by Taylor’s theorem, we find the approximate linear 
simultaneous equations for Aw, Aa,, Aas, where Aw, = (a%— cg), etc., to be 

E2 cos? 6} Aa, + Xb} cos? 6! Aa, + S{b}4r — 1 —h 4r 2r — 1} x cos? OLhAa, + Lat cos 1 = 0, 

226} cos? 61 Aa, + Zet cos? 6} Aa, + Zfel4r — 1 — hb} 4r 2r — 1} x cos? OhAa, + Xcl cos? Oi = 0, 

=2r cos? 01(2b1 — h 2r — 1) Aay + Ur cos? O}{2e1 — h(2r — 1) bt} Ace, 

+ Ir cos? O42 4r — le — bbl 2r — 112r—1+4 h24r 2r— LD} AAA, 





+ X(2c} cos 0 —h 2r—1a!)rcos 6! = 0, (AQ) 
where b, = (®ap + Xop—1) cos 20,— (2c — Yor — Yor-1) sin 20,5 


e, = {2(23, +23,_1) — (Yor — Yor—1)? — (2 or — Yor — Yor—1)*} 


sin? 20, 





x {cos 20, cos? 0, — — {2 sin 20,(2 cos? 0, + cos 0,)} 


x {Oor(op a Lor) 7 (oy Yor + Vor—1 Yor—-1 )}- (A 10) 


Using equations (A 9) an iterative method may be employed to find the values of %>, «1, & that minimize 
(2n¢). The first approximation may be found easily from the equation of the parabola through three 
likely points on the curve. 


Note added by Author in proof 


Professor M. G. Kendall has kindly drawn my attention to a paper in which Working 
and Hotelling (1929)* calculate the variances of the slope and intercept of a linear regres- 
sion and show that these define an hyperbolic region within which the ‘true’ line lies with 
an assigned probability. Their hyperbola is similar to that given in equation (2-15) above, 
the denominator of the first term (Y —a,X)? being (V7) instead of (0, —w). Working and 
Hotelling ascribe two degrees of freedom to x”, due to a (presumed) independent variation 
of the slope and intercept of the line: I have been unable to follow their reasoning on 
this point. In my case, n@, has a y%,-distribution, but I have called for independent 
knowledge of the variance in the error of the y-measurement. 

The two papers deal with different questions; in particular, I was mainly concerned 
with the case when both variates are in error. But the duality of the confidence ellipse 
and the acceptanee hyperbola noted in my section 2-2 is anticipated in the Working and 
Hotelling paper. 


* H. Working & H. Hotelling (1929). Applications of the theory of error to the interpretation of 
trends. J. Amer. Statist. Ass. Suppl. 24, 73. 














ar 


9) 


ize 


n of 








[ 97 ] 


AN ANALYSIS OF PAIRED COMPARISON DESIGNS 
WITH INCOMPLETE REPETITIONS} 


By JOHN W. WILKINSON 


University of North Carolina 


1. INTRODUCTION 


Experiments involving paired comparisons have mainly concerned the situation where 
each judge compares all possible pairs of treatments or objects. In certain types of experi- 
ments, this may require an excessive number of comparisons to be made by any observer. 
To overcome this handicap, Bose (1956) and Kendall (1955) constructed certain designs, 
symmetrical with respect to objects and judges, which do not require each judge to compare 
all possible pairs of objects. However, neither Bose nor Kendall proposed any procedure 
of analysis in their respective papers. 

The purpose of this paper is to consider the problem of analysis of the Bose—Kendall 
paired comparison designs. This analysis is carried out in the tradition of the fundamental 
Bradley—Terry (1952) paper concerning the situation where all possible pairs are compared 
by each judge. Using the Bradley-Terry mathematical model, likelihood ratio tests are 
constructed in detail for certain classes of hypotheses, and are stated for some additional 
situations of interest. To exemplify the test procedures, the proposed analysis is applied 
to an experiment involving pairwise comparison of handwriting specimens. 


2. DEFINITION OF PAIRED COMPARISON DESIGNS 
A paired comparison design, as defined by Bose, is as follows: 


Suppose we have t objects which we desire to compare according to a certain characteristic, 
and we have v judges available to perform the comparisons. The procedure of comparison 
will be as follows: 


(i) Each judge compares r pairs of objects (1 <r < $t(t—1)), and, for each pair compared, 
expresses his preference for one or the other object of the pair. (It may be desirable to allow 
the judge to express no preference with respect to either of the objects forming the pair. 
However, this possibility will not be entertained in this paper.) 

(ii) The pairs compared by any judge are all different. 

(iii) Among the r pairs compared by each judge, each object appears equally often, 
say a times. 

(iv) Each pair is compared by k judges (1<k<v). 

(v) Given any two judges, there are exactly A pairs which are compared by both judges. 

Bose constructed three series of paired comparison designs by making certain corre- 
spondences with known balanced incomplete block designs. One of these series produced 
the same designs as those constructed by Kendall. 

+ This research was jointly supported by the United States Air Force through the Air Force Office 
of Scientific Research of the Air Research and Development Command, and the National Research 
Council of Canada. 


7 Biom. 44 











98 Paired comparison designs with incomplete repetitions 


From the relationships among the parameters of balanced incomplete block designs, and 
from ‘\.e above definition of paired comparison designs, there exist the following relation- 
ships among the parameters of a paired comparison design: 


r= ta, A(jv—l)=r(k-1), va=kt-1), tar>2k, A<ha(a+l)<r. (2-1) 


3. EXPERIMENTAL PROCEDURE AND NOTATION 


Suppose we have available v judges, and we require each of these judges to compare r pairs 
of the 4¢(t—1) possible pairs of a set of t objects (a,, dg, ...,a,), (1<r< 4t(t—1)). The r pairs 
to be compared by each judge will be specified by the field plan of the appropriate paired 
comparison design employed. Presently available designs are given by Bose (1956) and 
by Wilkinson (1956). We shall refer to one set of r pairs compared by a specific judge as an 
incomplete repetition or a complete repetition, depending on whether or not r < 4$¢(¢— 1) or 
r = $t(t—1). When r = }¢(¢—1), the situation becomes that which is considered in detail 
by Bradley & Terry (1952), Bradley (1954a,6, 1955) and Terry, Bradley & Davis (1952). 

For each pair of objects compared by judge u (wu = 1,...,v), the judge expresses a pre- 
ference for one object or the other, assigning the rank one to the object judged superior, 
and the rank two to the object judged inferior. 

Let r;;,, denote the rank assigned by judge u to the ith object when it is compared with the 
jth object. Let (a,;—>a, | u) denote the preference indicated by judge w for object a, over 
object a; (¢+j, 7,7 =1, ..., t; u = 1, ..., v). Then 


1, if (a,>a,| 3] (31) 


"iu \2, if (a,<a;|u), 
if objects a; and a; are compared by judge u. From (3-1) it is readily observed that 
iju i "jiu = 3, (3-2) 
when objects a; and a; are compared by judge u. To handle the complication created by the 
fact that each judge does not compare all possible pairs of objects, we shall define 


1, if a,;anda,are compared by judgeu, | 


as 0, if a, anda, are not compared by judge u,| 


(3:3) 


iju = 


(i+j,1,j = 1,...,t; w= 1, ..., v), and we shall conventionally take 


Niy=9 (= 1,...,¢;u=1,...,0). 


Hence, for each judge u we can define an incidence matrix N,, = (n;j,) which is symmetric, 
has 1’s or 0’s for elements, and is specified by the field plan of the paired comparison design 
used. These matrices N,, satisfy the conditions 


(a) NA=ak (u=1.,...,v), \ 
(6) DN, =kE, (3-4) 
u=0 


(c) trN,N, =2A (u,u’ =12,...,0; iain 


where £ is a matrix of order t, each of whose elements is unity, and where N, = kJ, I being | 


the identity matrix of order t. In fact, these conditions are the necessary and sufficient 
conditions for the matrices N, to specify a paired comparison design as defined in §2. 





tk 


and. 
ion- 


etric, 
lesign 





JoHN W. WILKINSON 99 


Conditions (a), (b), (c) are directly necessary for the conditions (iii), (iv), (v) of §2. Condition 
(ii) is satisfied since N,, has only 0 and 1 as elements. Since 2r is the number of unities in N,,, 
the constancy of r follows from (a), which implies the number of unities in each row of N,, 
to be a. Hence 2r = ta. 

To simplify notation, we will introduce the following conventions: ¥, 5, ©, } and 


: 2. = 


II. II. I]. [] will indicate, respectively, single sums and products with respect to the 
‘ae 


j um 


depicted quantity over its range, where i = l,...,¢; 7 = 1,...,t; w= 1,...,0; m= 1,...,g. 
a’ will denote S for some 7, where this 7 will be clear from the context. = and I will 
iaiadti Ps 9 double sums and products,i = 1,...,j-—1; 7 = 2,...,t. S and TI will 
indicate, respectively, . ‘ and I I. Any departures from these eesaiedhadiin will be 
i=1j= i=1 j= 
i+j i+j 


specified when the departure is incurred. In addition, log and In will be used to denote 
common and natural logarithms, respectively. 


4, MATHEMATICAL MODEL (BRADLEY-TERRY MODEL) 


Suppose there exist numbers 7,,,, ..., 77, corresponding to judge wu (wu = 1,...,v), where 


20 ¢=1,...,¢ 6 = l,...,%), 
U%=1 (w=,...,0), we 
i 

and such that P(a,>a,;|u) = c.. 

Tin + Mju (4-2) 
Ts 
P(a;<a;|u) = —*_, 
. i| Tiy + Mju / 


((+j, t,j = 1,...,t; w= 1,...,v). The conditions in (4:1) are imposed partially for con- 
venience and partially to ensure determinacy in certain systems of equations which will be 
encountered. These numbers 7,,,, ..., 7;,,,; Will be considered as true ratings (or preferences) 
of the ¢ objects a,, ...,a, corresponding to the judge w. 


5. THE LIKELIHOOD FUNCTION 


From (4-2), the probability of judge wu assigning rank r;,,, to object a; and rank r;,,, = 3—r 
to object a; upon comparison of the pair of objects is 


Tin 2—riju Ty 2K jin bs m3 Tiju 3, Tite 
Niju —_= Niju ee ee (5 1) 
Tiy + jy Tin > Mju, Tiy + Mjy 


iju 








Provided that the pair of objects is compared by judge u, we observe that if (a;>a, | u), 
then 7,5, = 1, rj, = 2 and (5-1) becomes 








and if (a;<-a,;| u), then r;;, = 2, 7; = 1 and (5-1) becomes 


Tju 


—_™— = P(a,<a,|u). 
“Tin + Wy (a; | ) 








100 Paired comparison designs with incomplete repetitions ; 


If we assume probability independence between pairs of objects when compared by 
judge u, we obtain the likelihood function for the set of comparisons made by judge u, 

2a—% niin Tiju 

Tig uri, "ie iH} a iti 


L, = Tn; ae: eS lr (5-2) 
a sa eT I Niju(Miu + Mju) 


where R denotes the set of numbers R = {i,j: i<j, i =1,...,j—1; j = 2,...,5 Mejy = VY. 
If we assume probability independence between experiments performed by different 


judges, then from (5-2) we obtain the likelihood function for the complete set of comparisons 
made by the v judges, 


Ly = I Ly, (5:3) 
Uu 
and hence, if we set at = 2a0—- SD Nijul igus 
ju 
we have In Lg = Yaflnay,— YY Mju ln (Ay + Mju): (5-4) 
i i<j u 


6. LIKELIHOOD RATIO TESTS 


Test I. Consider the situation where each of the v judges compares only one set of r pairs, 
the set being specified by the paired comparison design used. We desire information about 
the differences, if any, among the true ratings of the ¢ objects. Initially, we are willing to 
make the assumption that the v judges are consistent as a group; that is, we desire to test 
the null hypothesis, 


A: Tl iy = 1/t, 
against the alternative hypothesis, 
Bi: %, =, (§=1.,...,¢ # = 1,...,9), (6-1) 


where the 7,’s are not all equal. 
When the alternative hypothesis H, is true, (5-4) becomes 


(In Ly) | H,) = Sa¥ Ina, —k Y In (a, +7), (62) 
a i<j 
where the simplification of the latter term is aided by the properties of the incidence matrices 


listed in § 3. 
We note that a}, defined in (5-4), can be rewritten, using (2-1), as 


af = 2k(t—1)—Y DY Nijuliju- 
ju 
When r = 3$¢(¢—1), then n,,, = 1 for alli, j, wu (¢+)), k = v, and 
j 4 


and corresponds to a,, defined by Bradley & Terry (1952), with n replaced by v. 
Now maximizing (6-2), subject to the constraint } 7; = 1, yields the set of equations 
i 


a* 

——k>’ (a,+7;)1+p = 0 (=1,....8,} 

m, j (6-3) 
7; = 1. | 





P: 


by 


4) 


est 


5-1) 


(6-3) 





JoHN W. WILKINSON 101 
Summing (6-3) with respect to 7 yields 
(2at—4r)v+p = 0, 
which implies ~ = 0, since from (2-1), r = fa. Hence the maximum-likelihood estimates 


Py» «++» Py Of 71, ..., 7 Will be obtained from the system of equations 


; 

_kY' (pty) =0 (i= 1,...,0) 

Pi j (6-4) 
= Pi =1. 


When the null hypothesis H, is true, (5-4) becomes 





(In Ly) | Hy) = (2xvt — 3rv) In ss In 


2 
_kt(t-1) 
2 


= —vrln2. (6-5) 


In 2 





Thus the likelihood-ratio test of the null hypothesis H, against the alternative hypothesis 
H,, as specified in (6-1), will be in terms of the likelihood ratio A,, where, if we define 


B® = k¥ log (p,+p;)— LaF log p;, (6-6) 
i<j i 
where 7,, ..., p, are solutions of (6-4), then 
Ind, = — {vr In 2— BM In 10}. (6-7) 


When r = 3¢(t—1), then k = v, and B® corresponds with the statistic B, discussed by 
Bradley & Terry (1952). 

For the likelihood ratio in (6-7) to be useful to test the hypothesis stated in (6-1), it 
would be desirable to have some knowledge of the distribution of B“—at least when the 
null hypothesis H, is true. 

It is possible to generate all combinations of the object sums of ranks. Then under the 
null hypothesis of equality of true object ratings for each judge, the probability of each 
combination of rank sums is obtainable. This information would be sufficient to obtain the 
distribution of B® under H,. 

However, a direct computation of the probabilities of these various combinations of rank 
sums would be extremely tedious, even for small values of the design parameters. The fact 
that every permutation, of the rank sums corresponding to each judge is not possible, since 
each judge does not compare all possible 4¢(f— 1) pairs, is the cause of the complication. 
However, we can utilize the symmetry of the designs to circumvent this particular difficulty. 

Let A; (¢ =1,...,¢) denote the set of vx = k(t—1) elements which are the k(t—1) ranks 
assigned by all the v judges to the object a; when compared with the other (t— 1) objects. 
That there are va = k(t—1) such ranks follows from the fact that each of the v judges 
compares a; with exactly a of the remaining (¢— 1) objects. Also, from the design properties 
depicted in §§ 2 and 3, it follows that these k(¢—1) ranks are those for a; after comparison 
with each of the remaining (t— 1) objects exactly k times. Hence, we can subdivide A; into 
k disjoint subsets A;,,, with each A,,, containing (t— 1) elements which are the ranks for 








102 Paired comparison designs with incomplete repetitions 


a, after comparison with each of the remaining (¢ — 1) objects exactly once. Thus if we denote 
the (t— 1) elements of A;,, by rj; (¢+j;j = 1, ...,¢), then the sum of elements of A;,, is 


Siw = 2 Tijus (6-8) 
and the sum of the elements of A; is 


k 
DL lige = DD MiguTigu (6-9) 
u’=1 j uj 

Hence the set of rank sums 


k k 
(EE matin LE Maton) = (BB's on ES, Brae) 
Uu u = 


k 
- x (Siw cog Sy); (6- 10) 


could be considered as being the set of rank sums obtained from k complete repetitions of 
all 3¢(t— 1) pairs. It is these rank sums which determine B®, and hence its distribution can 
be determined from the distribution of the different combinations of these rank sums. The 
possibility of putting the rank sums in the form of (6-10) enables the distribution of B®, 
under the null hypothesis H, of equality of true object ratings, to be obtained in a manner 
described by Bradley & Terry (1952). Hence, tables presented by Bradley & Terry (1952) 
and by Bradley (1954b) may be used to provide the distribution of B® under the null 
hypothesis H,. The tables are available for design parameters ¢ and k in the following range: 

= 3,k =1,...,10;t=4,k =1,...,8;¢=5,k =1,..., 5. These tables also list the estimates, 
Py; +++» Pp, Corresponding to the rank sum combination (6-10). 

Thus a test procedure for (6-1), for ¢ and & in the indicated range, is as follows: Determine 
the rank sums (6-10). Then from the previously indicated tables, obtain the corresponding 
values for ,,...,2,, B®, and the probability P that B® will not be exceeded if the null 
hypothesis H, is true. 

If either ¢ or k is outside the range for the tables, the above test procedure cannot be used. 
However, if only k is outside the indicated range, it is possible to use the available tables 
to obtain the estimates p,, ..., p,, or at least a good first approximation to them, depending 
on whether or not there exists an integer c which divides the rank sums and k evenly and 
which is such that k/c is within the above indicated range for k. This technique is discussed 
in detail by Terry et al. (1952) and by Bradley (19545). 

If t>5, the available tables will not be of assistance in obtaining approximations to 
Px; --+, Py except in special cases. Hence, to obtain estimates, p,,...,p,, in this case, it will 
be necessary to solve equations (6-4). Some methods aiding in the solution of equations of 
this form are suggested by Dykstra (1956), and Bradley & Terry (1952). 

Once 9, ...,p, have been obtained, B® can be evaluated from (6-6), but the significance 
level of B® can only be approximately determined. When this is the case, the test will be 
in terms of the statistic 


T® = —2IndA, = 2vrln2—2Bln 10, (6-11) 


the discussion of which is reserved for § 7. 
Test II. Suppose that in (6-1) the 7,’s of the alternative hypothesis are divided into two 
groups, the elements within each group being equal. That is, we desire to test the null 


hypothesis H,: 7, =f (¢ = 1,...,¢; « = 1,...,9) 





ag 


lil 


th 


jt 


te 


two 
null 














Joun W. WILKINSON 103 


against the alternative hypothesis 


7 
Hi: Mi, = 14 — (6-12) 
ial a eet, ew t,.th 





t-—s 


For this situation, the equations (6-4) can be solved explicitly for p, the maximum- 
likelihood estimate of 7, which, using (2-1) is 


ks(4t—8—3)-2 ¥ YD ju iju 
I i=1j u 
p= 8 : 
ks(Sst — 2t? — 6s + 3t) — 2(28—t) D YD niju ijn 
i=l ju 





(6-13) 


However, this expression diminishes in importance if we proceed in a manner analogous to 
that used by Bradley (1955) and let X denote the number of times an object of the first group 
of s objects ranks above an object of the second group of (¢—s) objects. Then 
x 
P= js(t—a)®-+ (28-1) X” i 

and the test procedure for (6-12) reduces to that of the binomial or sign test based on the 
ks(t —s) comparisons of the objects of the first group with objects of the second group. 

Test III. Suppose we have g groups with v judges in each group. We are interested in 
investigating the equality of the true treatment ratings of the t objects under the assumption 
of within-group judge consistency, but not necessarily assuming between-group judge 
consistency. It should be observed that these g groups could contain the same v judges, 
but due to some additional feature, such as significant time lag between repetitions, or 
training gained from continued experimentation, we are unwilling to assume between-group 
judge consistency. 

Let 77%, ...,. 7” represent the true ratings of the ¢ objects corresponding to the wth judge 
in the mth group (wv = 1,...,v; m = 1,...,g). Then, we wish to test the null hypothesis, 


Hy: 1% = 1/é, 
against the alternative hypothesis, 
A: nant (¢=1,...,¢;u = 1,...,0; m= l,...,9). (6-15) 


Let n%,, = 1 or 0 depending on whether or not the ith and jth objects are compared by 
the uth judge of the mth group (i+ j; i,j = 1,...,t; w= 1,...,v; m =1,...,g), and con- 
ventionally take n¥,, = 0 (i = 1,...,4; w= 1,...,0; m = 1,...,g). It is clear that the corre- 
sponding incidence matrices 

Ni = (n%j,) (m= 1,...,9; w= 1,...,0), 


will have analogous properties to those given in (3-4). 
Let 77, be the rank assigned to the ith object if compared with the jth object by the wth 
judge of the mth group. Then, corresponding to (3-2), we have 


iu tT Feu =3 (m=1,...,g; 0 = I,....8)) 


when the ith and jth objects are compared by the wth judge of the mth group. 








104 Paired. comparison designs with incomplete repetitions 


Now, under the assumption of pzobability independence between pairs of objects, judges 
and groups of judges, and by defining 


Ain = 200—- YY Ny 1 Tus (6-16) 
j u 
we obtain (In Li | H3) = S{X ak, na? —k Y In (an? +77)}, (6-17) 
m t i<j 
(In Lig) | Hy) = —vrg|n 2, (6-18) 


where Ji,) is the likelihood function, and (In L;,)| H;) denotes the natural logarithm of the 
likelihood function when the hypothesis H; is true (i = 0,3). Then, if we denote the likeli- 
hood ratio by A, and define 


BR = k ¥ log (pi + p}') — X Aim log pr; (619) 
i<j i 


where p7", .... py", the maximum-likelihood estimates of 77”, ..., 7/", are solutions of the system 
of equations 


* 
fim _ SY (pr+py4+=0 (i =1,...,0), 
Pi j (6-20) 
=p = (n= 1, ...;9), 
tu 
and define B® = > B®, (6-21) 
m 
and T® = 2ur In 2— 2B In 10, (6-22) 
we obtain T® = —2IndA, = DT. (6-23) 
m 


The probability of a specified value B of B® can be obtained by determining the pro- 


babilities of the joint occurrences of all combinations of BY, ..., BY such that ¥ B® = B, 
m 


and then summing these probabilities. Clearly, the distribution of B® (m = 1, ...,g) under 

H, is the same as the distribution of B® under H,. Hence, the distribution of B® under H, 

can be obtained from the distribution of B® under H,. Tables of this distribution fort = 3, g, 

k = 2,..., 5are given by Bradley & Terry (1952), and for ¢ = 4,9,k = 2 by Wilkinson (1956). 
The test procedure for (6-15) is then the following: Compute the rank sums 


TU GF =1,...8), 
ju 


corresponding to the mth group of judges (m = 1,...,g). Then, if the design parameters 
are within the range t = 3,k = 1,...,10;t=4,k =1,...,8;¢=5, k =1, ..., 5, determine 
B® from the previously indicated tables. Sum these B®’s with respect to m, obtaining B®. 
If t = 3,9,k = 2, ..., 5; t = 4, g,k = 2, the exact significance level is given in the above- 
mentioned tables. Otherwise, without further extension of these tables, an exact test of 
(6-15) will not be available. When this is the case, the test will be in terms of the statistic 
T® of (6-23), the approximate significance level of which cai be determined as indicated 
in §7. 

For large t, g, k, it will be necessary to solve the equations in (6-20) for pj’, ..., pj 
(m = 1,...,g), and then to evaluate B® using (6-19) and (6-21). 





ha 
of 


ag 


an 


of 


th 


ear 
th 
of 
fel 
as 
is | 


gre 


ges 


OM 


ters 
Line 
Be), 
yve- 
t of 
stic 
ited 


m 
» Pt 








JoHN W. WILKINSON 105 


Test IV. It is conceivable that the g groups of v judges considered in Test III could have 
had the same judges in each group. For this situation, if we are willing to assume consistency 
of the judges from one repetition to the next, we may wish to test the null hypothesis, 


A: wr, = it, 
against the alternative hypothesis, 
Hy: a=a, (§=1,...,¢; 0 = 1,...,0; m= I, ...,9). (6-24) 
The likelihood ratio test for (6-24) is provided by computing the rank sums 
>y ~ Di Mijul tu (¢ = 1, ..., 8), 


m 
and B® = gk ¥ log (p; + p;)— Daj log p;, (6-25) 
i<j i 
where a} = Sa*, and py, ..., p,, the maximum-likelihood estimates of 7,, ...,7,, are solutions 
™ 

of the equations “a 

~—gk > (pp+pj)*=9 (= 1,...,4), 

Pi j 

XP = 1. 


B® corresponds to B® with k replaced by gk. Thus, for ¢ and gk within the range of available 
tables, B® and its exact significance level can be obtained directly from these tables. When 
either ¢ or gk is outside the range of available tables, au exact test is not available. In this 
case, the test will be in terms of the statistic 


T® = 2urgIn 2—2BIn 10, (6-26) 


the approximate significance level of which can be determined as indicated in § 7. 

It should be noted that there are many ways in which g sets of r pairs could be assigned to 
each of the judges. No matter how the sets of pairs are assigned, the statistic B® will have 
thesame distribution under the null hypothesis H, of (6-24). This would suggest that simplicity 
of assignment of the g sets to the v judges would be the popular criterion. However, it is 
felt that it would be better to have each judge compare as many different sets of r pairs 
as possible, as this may provide greater connectivity. The investigation of an exact criterion 
is desirable, but will not be considered at this time. 

Test V. For the situation of Test III we now wish to test for judge consistency between 
groups; that is, we wish to test the null hypothesis, 


H,: mR = 7;; 
against the alternative hypothesis, 


H,: agian? (¢ =1,...,t; 4=1,...,0; m =1,...,g). 
The likelihood-ratio test is in terms of the statistic 
T® = 2(B® — B®) In 10, (6-27) 


the distribution of which under the null hypothesis H, will depend on 7, ...,7;. Hence, 
T® will not provide an exact parameter-free test, but will supply an approximate test as 
discussed in § 7. 








106 Paired comparison designs with incomplete repetitions 


It is natural to desire a test concerning the true ratings of the ¢ objects without assuming 
consistency of the judges as a group; that is, we would like to have a test of the null 


hypothesis, H,: 1, = It, 


against the alternative hypothesis, 


H: the 7;,’s are not all equal. 


The likelihood ratio could be determined but its distribution would depend on the paired 
comparison design being used, the number g of repetitions made by each judge, and the 
criterion for assigning the g sets of r pairs to each judge. The assignment of these sets is, in 
itself, a design problem. However, it is conceivable that with the assignment of these sets 
to satisfy certain additional symmetry conditions, tabulation of the distribution for small 
values of ¢ would be practicable. 

Of course, ifg = cv, c > 1, such that each judge compares each of all the possible v different 
sets of r pairs c times, the situation reduces to the equivalent of each judge making the 


comparison of ck sets of all possible }¢(t— 1) pairs. A test for this is given by Bradley & Terry 
(1952). 


7. LARGE-SAMPLE DISTRIBUTIONS 
1 . 
Let us = t(r.-7) (7 = 1, ...,8), (7-1) 


where 7,...,p, are the maximum-likelihood estimates of 7,,...,7, given in (6-1). Now 


observe that x = DN juTign = FR(t—-1); 
« § u 


hence, LaF = 2avt — $kt(t—1) 
= $kt(t—1), 
where a* is defined in (5-4). Now, upon substitution in (6-6) and (6-11), we obtain 
BO = gkt(t— 1) log 2+k ¥ log {1 + dy, +y,)}— Laz log (1+y,), (7-2) 
i<j a 
and T® = 2k Y In {1 +4h(y;+y,)} +2 Daf ln (1+y,), (7-3) 
i<j t 


respectively. Substitution of (7-1) in (6-4) yields 
af = $k(1+y,) 2 fl+hyt+y,)}4, 
and henco, upon substitution for a¥ in (7-3), we obtain 


TO =kKVAt+y) (Lt y) V+ ys ty) + 2k Vn{L+ Hy. +y (74) 
j i<j 
T, as expressed in (7-4), can be put in the form 


TY = Yt D y+ Ry), (7-5) 
a 


where R(y;) depends on higher powers of y; than the second power. Then, in the manner 
followed by Bradley (1955), if we redefine 
1 , 
a2 (¢ = 1, ...,é), 


t+ Yk 





wh 
col 
as 


de; 


wk 


I 


Ling 
null 


ired 
the 
3, in 
sets 
mall 


rent 


the. 
erry 


(7-1) 


Now 





JOHN W. WILKINSON 107 


where 6;, is a sequence of constants converging to 6; as k->0o, it can be shown that R(y;) 

converges to zero in probability as k->o0o, and that 7 has the same limiting distribution 

as {kt ¥ y?. This limiting distribution is, under H,, a non-central y?-distribution with (¢— 1) 
i 


degrees of freedom and parameter of non-centrality 
A = }8® > 63, 
i 


which, for large k, can be approximated by 
1\2 
A= hk#> (7-7) , 


Under the null hypothesis H), 7; = 1/t, d;, = 6; = 0 (i = 1,...,t). Therefore A = 0 and thus, 
under Hy, 7™ has a limiting central x?-distribution with (t—1) degrees of freedom. 
Similarly, it can be shown that 7, T® and T have, under their respective null hypo- 
theses, limiting central y?-distributions with g(t—1), (¢—1) and (g—1)(t—1) degrees of 
freedom, respectively. Thus approximate significance levels for the indicated tests can be 
obtained from tables for the y?-distribution with appropriate numbers of degrees of freedom. 


8. EXTREME SETS OF RANK SUMS 


For the situation in Test I, the extreme values which the sum of ranks corresponding to the 
ith object may have are 


LDU NijuTin = 200 or av (t= 1,...,t). 
j u 


When either of these cases arises for any of the objects, equations in (6-4), which provide 
our maximum-likelihood estimates for the parameters, become somewhat valueless. In 
order for this rank sum to be 2av (or av), it is necessary that the ith object be judged inferior 
(or superior) in all comparisons made by all judges. When these situations arise, the pro- 
cedure is to estimate 7; by 0 (or 1); then, drop this object from the analysis and consider 
only the remaining ¢— 1 objects. 

Let us consider the case where the tth object has the extreme sum of ranks 


~ = Ngul iu = 2av. 
u 


Omitting this object from consideration, we will have a reduced set of (¢— 1) rank sums, 


namely, 4 ‘ 
y x a x , a ae | (8-1) 
uU u 


Now, from the symmetry of the paired comparison designs, we obtain the logarithm of the 
likelihood function Lg) under the alternative hypothesis H, of (6-1) to be 


(In Ly) | H,) = a  {2et—2)- 5 z E miju gu} In mk Finn, +m). 
i<j 
It also follows easily that the system of equations, which provides maximum-likelihood 
estimates of 7, ...,7_,, and the test statistic B® will be the same as in (6-4) and i” 6), 
respectively, with ¢ replaced by t—1 








108 Paired comparison designs with incomplete repetitions 


Hence, the test procedure for (6-1) will be the same as that previously described with 
t replaced by ¢— 1, and with the original set of rank sums replaced by the reduced set (8-1). 
For the case where the tth object has the extreme rank sum 


» > b> Niu "iju = av, 
ju 
the omission of this object from consideration will result in the reduced set of (t— 1) rank sums 


>> D M4 35u% 134 — 2k, sees = D M,-1, ju fe 2k. (8-2) 
u u 


J 


Similarly, the test procedure for (6-1), in this case, is the same as that previously described, 
with ¢ replaced by (¢—1), and with the original set of rank sums replaced by the reduced 
set (8-2). 
In Test IV, for the case where the ith object has the extreme rank sum 
pp» LN jul iu =2agv or agu, (8-3) 
miu 
the test procedure for (6-24) is the same as previously described, with ¢ replaced by (¢—1), 
and with the original set of rank sums replaced by 


LLY jul iju— Gk, nk DDD Ma, jul, ju— Gk, 
or >» p> >> Tul ju — 29k, eoey = p> D 1, ju 13 2gk, 
miju F mju 


depending on whether or not the fth object rank sum is 2agv or agv, respectively. 

For Tests III and V, if an extreme rank sum occurs for the tth object of the mth group, 
mney, TU uttiu = 20v or av, 
j wu 
the respective test procedures can be employed with ¢ replaced by (¢—1), and with the 
original set of rank sums replaced by the reduced set of rank sums, 


m m - m ym 
DD jul tu — &: sees DL M1, julia ju— 
ju j u 
™m - m i. 
or LX jul Tu — 2k, sine DD Py, jul 1,3u— 2h. 
j @ j u 


Such ari alteration to the test procedure can be extended when more than one of the g 
groups has an extreme rank sum. For Test V, if the th object in all g groups has an extreme 
rank sum as in (8-3), in order to evaluate B® we will need to follow the procedure described 
in Test IV, with ¢t replaced by t— 1, and to employ the reduced set of rank sums of (8-4). 


9. EXAMPLES TO ILLUSTRATE THE PROCEDURES OF ANALYSIST 


Example 1. Consider t = 5 specimens of handwriting compared pairwise by v = 6 judges, 
with each judge recording a preference for one object of each pair which he compares. We 
will refer to the factors which influence judgement as characteristic x. The r pairs compared 

+ The data used in these examples is a small portion of that collected for an experiment conducted 


at the Psychometric Laboratory of the University of North Carolina. It is used to illustrate the 
procedures of analysis only. 





by t 
by I 


Nov 
whe 
hav 
spo 
the 
cor 
cor 


wh 
eac 


ith 
1). 


3-4) 


up, 


the 





JoHn W. WILKINSON 109 


by the wth judge (wu = 1, ..., 6) will be dictated by the field plan of design (2), Table 1, given 
by Bose (1956). The incidence matrices for this design are: 

















001 1 0 x Vel Th 7 
S 0 0 1 A ®o 0 1 8 ft 
N,=]1 0 001], M=f0 1 01 Of, 
Pod), 4) Oe i 0 1.0 8 
\o 1 10 Of 11000 
sl oe t 8 [° 1090 1 
F 0 1 0 | eo ff ® 
N,=10 100 179, N=f0 0 0 1 14f,} (9-1) 
i @ 0 ft oY it te ¢ 
\0 Oo 1i 7 \ o 1 0.8 
yor TF e's 001 0 1 
1 @ 0 @ 1 f 0. ioi-s 
N=f1 001 Of, M=f1 i 00 0 
0 0 1 0 1 9 1 60 6 4 
01010 \1 0 0 1 Of J 
Now, for judge wu (wu = 1,...,v), we construct a (txt) preference matrix R, = (jju7iju), 


where 7;;,, = 1 or 2 if (a,->a; | u) or (a;<a; | uw), respectively. The preference matrix R,, will 
have 1’s and 2’s as elements corresponding to the unit elements of N,, and zeros corre- 
sponding to the zero elements of N,. From this definition of R,,, we observe that the sums of 
the rows of R,, yield a (¢ x 1) vector, the elements of which are the rank sums of the ¢ objects 
corresponding to judge wu. For example, the preference matrix #, and the rank sums vector 
corresponding to judge J, for this experiment are 


00 2 1 0 
_-_ ea Ff 
R,=f1 00 0 2f, 
22000 
. 2) Oe 
3 VMN yr 
j 
2 XY Neji" 251 
j 
R,l=|3 |= XU Nsi1" 31 |, 
j 
4 


D Maj" a5 
j 

3/ Do 5517551 
j 


where / is a unit column vector of appropriate dimension. By this procedure we obtain for 
each judge the following rank sums vectors: 


J,: (3, 2, 3, 4, 3), 
Jy: (2, 2, 4, 3, 4), 
J,: (2, 4, 3, 4, 2), (9-2) 
J,: (2, 3, 3, 3, 4) 
Js: (4, 2, 2, 4, 3) 
J,: (2, 2, 4, 4, 3) 











110 
Summing these with respect to judges yields 


> Ny jul jw * 
ju 


Paired comparison designs with incomplete repetitions 


+» Msju%sju) = (15, 15, 19, 22, 19). (93) 
jou 


Hence (aj, ..., af) = (9, 9, 5, 2, 5), (9-4) 


where a} is defined in (5-4). Entering the tables of Bradley (19546) for ¢ = 5 and for k = 3 
complete repetitions, we obtain, corresponding to the rank sums (9-3), 


(py, ---s Ps) = (0°38,.0°38, 0°10, 0-03, 0-10), B® = 6-686, 


P{B® < 6-686 | H} = 0-0404. 


Hence, we would conclude, at the 0-0404 level of significance, that the five handwriting 
specimens are different with respect to the characteristic x under the assumption that the 
judges are consistent as a group. 

It is of interest to note, from (6-11), that 


T® = 2vrIn 2—2Bln 10 = 10-80~ x%. 


From the \arge-sample properties of 7, we obtain the approximate significance level to 
be P(x% > 10-80 | Hy) = 0-028. Thus, although the approximate test will give too many 
significant results, the approximate significance level obtained is reasonably close to the 
exact significance level, even for k as small as 3. 

Example 2. Suppose that the six judges of Example 1 make two incomplete repetitions 
with r pairs in each repetition. The r pairs compared by the wth judge in the first repetition 
are those indicated in (9-1), and, hence, the incidence matrices for the first repetition are 


Nia, (e=1,...,). 


Ther pairs compared by the wth judge in the second repetition are indicated by the incidence 


matrices 


N2=N,,, (w=1,...,5), N2=N,. 


The rank sums vectors corresponding to each judge for the first repetition are given in (9-2). 
By an analogous procedure to that used in Example 1, we compute the rank sums vectors 
corresponding to each judge on the second repetition: 


),) 

)s 

)s , 
‘ (9:5) 
) 

).) 


> 





ae eT 


Summing these with respect to judges, we obtain 


~ ae ~ D Miu 7 Siu) = (14, 15, 18, 22, 21). (9-6) 
u u 


Hence (ats, ery ao) — (10, 9, 6, 2, 3), (9°7) 





EE 


wh 


TI 


ing 
the 


nce 


)-5) 


9-6) 


)-7) 





Joun W. WILKINSON 111 
where a, is defined in (6-19). By summing the vectors in (9-6) and (9-3), we obtain 


(SVD jul Tu +> DD Nu? Bju) = (29, 30, 37, 44, 40). (9-8) 
mju mju 


Hence, we obtain (ai, ...,a3) = (19, 18, 11, 4, 8). (9-9) 


The result in (9-9) can be checked by adding vectors in (9-7) and (9-4). The elements of the 
vector (9-8) are the rank sums which enter into the statistic B® of (6-25). Here ¢ = 5 and 
gk = 6—the corresponding number of complete repetitions—are outside the range of 
available tables. Hence we are unable to obtain either the value of B® or its significance 
level directly from the tables. However, since t = 5, we can use a procedure outlined by 
Bradley (19546) and Terry et al. (1952) to assist in evaluating B®. Dividing gk and the 
elements of (9-8) by two, we obtain 


igk =3, 4(9-8) = (14-5, 15, 18-5, 22, 20). (9:10) 
Then, for rank sums vectors 
(14, 15, 19, 22,20) and (15, 15, 18, 22, 20), 


we enter the tables for ¢ = 5 and for three complete repetitions and obtain corresponding 
estimates (0.51, 033, 0-08, 0-02, 0-05) and (0-38, 0-38, 0-14, 0-03, 0-07), 
respectively. Then, by linear interpolation, we obtain an approximation to the estimates, 
Pp, «+», Ps, corresponding to the elements of (9-10) to be 


(0-445, 0-355, 0-110, 0-025, 0-060), 
which, after adjustment to add to 1, is 
(0-447, 0-357, 0-111, 0-025, 0-060). 


Starting with these approximations and using the iterative formula given by Bradley & 
Terry 1952), we obtain 


(p,, --+1 D5) = (0-441, 0-361, 0-106, 0-029, 0-063). 


Substituting these values in (6-25), we obtain B® = 12-556. This value substituted in (6-26) 
yields 7 = 25-353, and since 7 has approximately a x?-distribution, under Hj, with 
t—1 = 4 degrees of freedom, it follows that P(7 > 25-353 | H,) < 0-0005, approximately. 
Hence, under the assumption that the judges are consistent as a group and consistent from 
one repetition to the next, we would conclude, at less than the 0-0005 level of significance 
(approximately), that the five handwriting specimens are different with respect to the 
characteristic x. 

Example 3. Suppose that in the previous example we were unwilling to assume judge 
consistency from one repetition to the next. The situation then becomes that for Test IT! 
Thus we have g = 2 groups of v = 6 judges comparing t objects according to a characteristic 
x, and, with reference to this characteristic, the judges in each group would not be con- 
sidered the same. 

Using the data of Examples 1 and 2 for each repetition separately, we obtain 


(p,; «--)Ps) = (0°38, 0°38, 0-10, 0-03, 0-10), BY = 6-686, TP = 10-800, 
(p,; «++; Ps) = (0°51, 0°33, 0-10, 0-02, 0-03), BY = 5-598, TY = 15-809. 








112 Paired comparison designs with incomplete repetitions 


Hence, from (6-23), 7 = 10-800 + 15-809 = 26-609. Since 7 has, under Hj, approxi- 

mately a y-distribution with g(¢— 1) = 8 degrees of freedom, an approximate significance 

level for Test III is P(T > 26-609 | Hy) = 0-001. Hence, under the assumption of within- 

group judge consistency only, we conclude, at approximately the 0-001 level of signi- 

ficance, that the five handwriting specimens are different with respect to the characteristic 2. 
Example 4. Suppose for the experiment used in Examples 2 and 3 we now desire to test 

for the consistency from one group of repetitions to the next. Then Test V is the one needed. 
From (6-27) we have the test statistic 


T® = 2(B®— B®)In10 = T8®—T® = 1-256. 





Since 7 has, under [,, approximately a y?-distribution with (g—1)(t—1) = 4 degrees o 
freedom, the approximate significance level for Test V, in this case, is 


P(T® > 1-256 | H,) = 0-86. 


Hence, under the assumption of within-group judge consistency, we have no evidence to 
doubt the hypothesis that the group judging criterion is the same from one repetition to 
the next. 

Example 5. Suppose that in Example | we are only interested in testing the null hypo- 
thesis—that the true object ratings are equal—against the alternative that the true ratings 
are in two groups, within which they are equal, but between which they are not necessarily 
equal. A test for this situation is given by Test II. For this example we will take s = 2 
objects in the first group and t—s = 3 objects in the second group. To determine X, the 
number of times an object of the first group of s = 2 objects is ranked above an object of the 
second group of t—s = 3 objects, recode the objects in such a way that a, ...,a, are in the 
first group and a,,,,...,@, are in the second group. Then for judge wu (w = 1, ...,v), construct 
a new preference matrix corresponding to R,, of Example 1, but this time use the notation 


{" if (a,->a,|u) 
™ “lo if (a,<a;|u), 


if objects a; and a; are compared by judge u. Then X will be the sum of the elements of the »v, 


(s x t—s) submatrices contained in the upper right-hand corner of the new ¢ x ¢ preference 
matrices corresponding to each judge. For the data of our example, the six, (2 x 3) sub- 


matrices are: 
010 01 1 
rr a @ 0 
oO 8 0 0 0 rE © 1 
01 0) \o o 1)? \i 1 0): 
Summing the elements of these, we obtain X = 15. Hence, from (6-14), the maximum- 
likelihood estimate of 7 is 


xX 
ed Fe. alee 
If we consider the preference of an object in the first group over an object in the second 
group as a success, we have X = 15 successes in ks(t—s) = 18 trials. The probability of such 
an occurrence, under the hypothesis that the probability is one-half that an object of the 





im- 


md 
ich 
the 





JoHun W. WILKINSON 113 


first group is ranked over an object of the second group, is P(X = 15| H,) = 0-0031. In 
addition, P(X > 15| H,) = 0-0096. Hence, under the assumption that the judges are con- 
sistent as a group, we would reject the null hypothesis of equality of true object ratings in 
favour of the alternative hypothesis H,, at approximately the 0-01 level of significance. 


The author wishes to thank Prof. R. C. Bose for his suggestion of the problem and his 
helpful guidance throughout its investigation. 


REFERENCES 


Boss, R. C. (1956). Paired comparison designs for testing concordance between judges. Biometrika, 
43, 113. 

BraD Ey, R. A. & Terry, M. E. (1952). Rank analysis of incomplete block designs. I. The method 
of paired comparisons. Biometrika, 39, 324. 

BraDtey, R. A. (1954a). Incomplete block rank analysis: On appropriateness of the model for a 
method of paired comparisons. Biometrics, 10, 375. 

BRADLEY, R. A. (19546). Rank analysis of incomplete block designs. II. Additional tables for the 
method of paired comparisons. Biometrika, 41, 502. 

BraD.eEy, R. A. (1955). Rank analysis of incomplete block designs. III. Some large-sample results 
on estimation and power for a method of paired comparisons. Biometrika, 42, 450. 

Dyxstra, Orro, Jr. (1956). A note on the rank analysis of incomplete block designs; applications 
beyond the scope of existing tables. Biometrics, 12, 301. 

KENDALL, M. G. (1955). Further contributions to the theory of paired comparisons. Biometrics, 11, 43. 

Terry, M. E., Brapiey, R. A. & Davis, L. L. (1952). New designs and techniques for organoleptic 
testing. Food Tech. 6, 250. 

Witxinson, J. W. (1956). Analysis of paired comparison designs with incomplete repetitions. 
Institute of Statistics, University of North Carolina, Mimeo. Series 147. 


8 Biom. 44 








[ 114 ] 


NON-NULL RANKING MODELS. I 


By C. L. MALLOWS 
University College, London 


1. INTRODUCTION AND SUMMARY 


Kendall (1950) has remarked that the major outstanding problem in ranking theory is the 
specification of a suitable population of ranks in non-null cases. Much attention has been 
concentrated on situations which Daniels (1950, § 5) calls of type (i): 


‘The sample is regarded as having been randomly chosen from a bivariate population 
of ranks’ 


the underlying population being either finite or infinite, e.g. bivariate Normal. Rather less 
work has been done on Daniels’s type (ii): 


‘There is a fixed set of individuals being assessed by a population of judges, or by the 
same judge in repeated trials, on a particular attribute whose ranking is known a priori. 
The random element is uncertainty of preference, the correlation being the result of real 
differences between the individuals, and the population is one of rankings conditional on 
a given objective order.’ 


Daniels (1950), following Babington Smith, Thurstone and Mann, treats this problem as 
one of regression. The present approach is by way of paired-comparison theory. The judge 
is assumed to arrive at a ranking of n objects U,, U;, ..., U,, by first making all the “C, com- 
parisons between pairs independently, but then only accepting the results if they are 
consistent with a ranking of the n objects. 

Various non-null models are proposed. The general model (eqn. (1)) depends on "0, 
parameters—this number is then reduced to n—1 by using the Bradley—Terry (1952) 
paired-comparison model (eqn. (4)). An alternative method of simplification is proposed 
which makes the probability of putting (in paired-comparisons) any two objects U; and U; 


in the correct order equal to 
444 tanh (klog 6 +log ¢), 


where @ and ¢ are parameters, and k = j —1 is the difference between the true ranks of the 
two objects. Thus this probability is a simple monotonic function of a quantity which is 
composed of a term increasing linearly with k, and a constant term. The null hypothesis 
corresponds to 0 = ¢ = 1. It is found that 0 is associated with Spearman’s coefficient r,, and 
¢ with Kendall’s t,, (eqn. (9)). Each of these parameters has a further interpretation. Thus 
0 may be regarded as assigning weights to the n objects, these weights being in geometric 
progression. The paired-comparisons are then made in such a way that the probability of 
ranking one object higher than another is a simple function of the weights assigned to these 
two objects. The parameter ¢ has the following further interpretation: having obtained a 
ranking of length n — 1 and wishing to introduce a further object, ¢ specifies the probabilities 
of this object being ranked in the various possible positions; these probabilities being in 
geometric progression, decreasing away from the true position (eqn. (18)). 

Putting ¢ = 1 in the general model gives a special case of the Bradley-Terry model 
(eqn. (11)). It is shown, however, that asymptotically, when the joint distribution of r, 





she 


ion 
ess 


the 
ori. 
‘eal 

on 


1 as 
dge 
m- 
are 
nC, 
52) 
sed 


dU; 


J 


‘the 
h is 
esis 
and 
‘hus 
otric 
y of 
hese 
ed a 
ities 
ig in 


.odel 
of r, 





C. L. MaLtLows 115 


and ¢, tends to the bivariate Normal form, the two coefficients cannot distinguish between 
the two parameters; it is therefore proposed to put 9 = 1 and to use only the one parameter 
¢ (eqn. (12)). This leads to exceptionally simple results, including an explicit form for the 
probability generating function (p.g.f.) of ¢;,, and an invariance property of the probabilities 
p;; of obtaining two objects U; and U; in the correct order in the ranking; p;; is found to 
depend only on j—7 and ¢, and not on i or n (§ 9). 

Methods of estimating ¢ are given; tests are derived which may be used to decide whether 
two judges’ rankings of the same objects are consistent with the same non-null hypothesis; 
more generally, given m judges, each of whom produces / rankings, we may test both for 
differences between judges and for inconsistencies within judges. The power of these tests 
follows immediately from standard theory. 

An expression is derived giving the conditional expectation of r, for given t,. 


2. THE GENERAL MODEL, 4% 
In the method of paired-comparisons, n objects U,, U},...,U,, whose a priori ranking is 
here assumed known, are ranked in pairs, there being “C, such comparisons in all. Babington 
Smith (1950) suggested that a suitable non-null hypothesis for this situation would be to 
assign to each pair (i, j) (i<j) the probability 7;; of ranking U; lower than U;; this we shall 
denote by my; = P{U;XU}}_ (1<i<j<n), 


and to assume that all the comparisons are independent. 
The proposed general non-nullranking model, depending on”C, parameters 7 
(i) The above model is used to generate a set of "C, comparisons. 

(ii) If the resulting complex of comparisons is consistent, i.e. if there are no circular 
triads of comparisons such as U;< U;< U;,< U;, then the complex is equivalent to a ranking; 
to each of the objects U; may be assigned an integer wu; giving the position of that object in 
the ranking. Thus in the case n = 4, the comparisons 


U,>U, U>Uy, UHUy Uy<Uy Uy<Uy Uy>U, 
give the ranking U,<U,<U3;<U;,, 
which may be expressed by 


4;,18a8 follows : 


U,=4, Uw=1, u=3, wW=2. 
We then have Ug < Uy < Us < Uy. 


(iii) If the complex of comparisons is inconsistent, no ranking is possible. The procedure 
(i) is then repeated as often as necessary until a consistent complex is obtained; and the 
corresponding ranking is accepted. 

The above is a possible way of generating rankings. It should be emphasized that it is 
not suggested that any actual experiment would be performed in this fashion; if the paired- 
comparisons can be made independently (and recorded), there is little point in trying to 
force the preferences into a ranking. What is here attempted is the production of a model 
for situations where the observed data is a ranking, the difficulty being that the comparisons 
are no longer independent (thus if U;<U; and U;<U,, we must have U;<U,). We are in- 
troducing this dependence by considering a conditional distribution of paired-comparisons; 
we admit only a subset (containing n! members) of the total 2"°: possible outcomes. 


8-2 








116 Non-null ranking models. I 


The probability of obtaining any given ranking (wu) from the above construction will be 
proportional to the probability under the Babington Smith model of the corresponding 
complex of comparisons. Thus the probability of the ranking 

(w) = (Uy, Us, Ug, Ug) = (4, 1, 3, 2) 
above is proportional to 
(1 — my) (1 — 73) (1 — 744) 7237 24(1 — 7154); 


putting ni; = 77;;|(1—743), 

we have that the probability of the general ranking (w) = (u,, u, ...,u,) under this hypo- 

thesis is H,: Pu}=K, TI Age, (1) 
1<i<j<n 


where K,, is so chosen that the probabilities sum to 1; i.e. 


Ky3=> [I Age“s-“” = P{comparisons are consistent} []  [m,;(1—7;,)]-*. 
(u) 1<i<j<n 1<i<j<n 


3. SPECIALIZATIONS 


The above general model is rather cumbersome; we now consider possible ways of special- 
izing it so as to reduce the number of parameters. We shall attempt to find models which 
simplify 
(i) the probabilities P{(w)} of the various rankings (u); 
(ii) the dist ibution problems connected with the two coefficients r, and t,; 
(iii) various other interesting and important quantities, such as 
Piz = P{u, <u; | ali comparisons consistent}, (2) 


which are complicated in the general case. 


4. THE BRADLEY-TERRY MODEL 
In the paired-comparison case, Bradley & Terry (1952) reduce the number of parameters 
from "C, to (efectively) n—1 by assuming that 
Mig = PLU; <U;} = 1j/(7, +7) (L<i<j<n) 
for some non-negative numbers 7,, 7, ...,7,,- This gives 
Ai; = 1; [75 (3) 
and for the proposed ranking model, we have 


Hy: Pl(u)poc TT (mylm,)eeneu-wd = TY ab FMW oc TT 
1<i<j<n 1<i<n 1<i<n 


Mi, (4) 


a 


This is a term in the expansion of the permanent 


* + 

| S 2 wa 

| 2 n a 

| Me MEE | theses... (5) 
om, * is | 

| S cc . 8 


We note that this model is closely associated with what we may call a ‘generalized matching 
coefficient’, this being taken to mean a coefficient 


A(u) _ >» Hingis 


1<i<n 





and 


ing 


ial- 
Lich 


ters 


(4) 


(5) 


hing 





cal 
C. L. Mattows 117 


where the {a;;} om, take any values whatsoever. Examples of measures which are essentially 
of this form are: 
(i) Number of matches; put «,; = 6} (Kronecker 6). 

(ii) Spearman’s r,; put 4,; = (t—J)*. 

(iii) Spearman’, ‘footrglp’; put a,; = |¢—j|. 

Others which may be considered are: 

(iv) Number of matches + near misses; put «,; = 6{-1+ 64+ 0/41. 

(v) Weighted form of (iv); put «,; = 6}-1+ K6}+64+1, where, for example, K = 2. 

In the above cases, the various measures, ((u) say, are equal either to the corresponding 
A(u) directly, or to a simple function of A(u); we have 


C(u) = F(A(u)) for all (w), (6) 
where also A(u) = F-\(C(u)) for all (u). (7) 


The score S(u) for Kendall’s ¢, (see eqn. (8)) can be expressed in the form (6), but for n > 4 
the inverse function as in (7) does not exist (see Appendix [). 
The p.g.f. of A(u) for the above model is simply 


7 ES 
Ht: % Pi(u)}z ) = | 27} |/| 774 |. 
Uu 


However, in the absence of simple methods of manipulating permanents, we shall not 
investigate this model further at present. 

Barton & David (1956) have proposed an alternative hypothesis based on distorting the 
null matching distribution. 


5. AN ALTERNATIVE METHOD 
We now consider an alternative method of simplification of the model %,. The most general 
non-null model possible would specify the probability of each ranking separately, needing 
n! — 1 parameters. We have reduced this number to "C, by assuming a special structure for 
the model; it can only be further reduced by even more restrictive assumptions. Consider 
the following additional assumption: 
Assumption A. Rankings (u), giving (when compared with the standard ranking 
(1,2, ...,)) the same values for both r, and ¢,, have the same probability. 
This assumption is equivalent to 
Assumption A’. The probability of any ranking (vu) is a function only of 
Riu) = XX  (j—t)sgn (uj—u;) 


1<i<j<n 


=2 > tu,—4 3n(n+1) 


1<i<n 

= jn(n?—1)r, ' (8) 

and S(u)= YY sgn(u;—u,) 
1<i<j<n 


4n(n—1)t,. ) 





Consider also the following assumption: 


Assumption B. Pairs of rankings which are inverses of one another have the same 
probability. 








118 Non-null ranking models. I 


Two rankings (w), (v) are said to be inverses (sometimes ‘conjugates’) if when u,; = j, 
then v; = 7. Thus the rankings (w) = (3, 5, 2,4, 1) and (v) = (5,3, 1, 4, 2) are inverses since 
U= 3, ¥3=1; u=5, v9, =2; ete. 
It is well known (see, for example, Kendall, 1955, p. 6 (not p. 11)) that for such a pair of 
heii Riu) = Riv), S(u) = So), 
and so Assumption B is contained in Assumption A. 


From Assumption B and eqn. (1) we may obtain certain relations between the parameters 
A,;; thus for the pair of rankings above we obtain 


P{(u)} = KgAqe Aga’ AraAis' Azg' Ana! Aas! Aga Aas: Aas 
P{(v)j = Kg Age Ags Aja! Aig Ags Ang As! Aga Ags Aas" 
and for these to be equal we must have 
Nard, = ABsA5s- 
Proceeding thus for all pairs of inverses we obtain the following relations (for n = 5): 
Aye = Avg = Aggy = Ags =Ay, say 
Ais = Ae = Ags = Ag, Say 
Aug=Agye = Ag, Say 
and we put Ay =A 
Further, AyAg = AR, AgAy = A}. 
Similar relations (in fact subsets of the above) are obtained for n = 3, 4. These relations 


are generalized in Appendix II and show that Assumption B contains 
Assumption C. For any (fixed) n, we have 


(i) Ay = Aji (LS t<j<n); 
(i) AgaAg =AR (25k <n-}). 


Assumption C (ii) implies that the {A,} are in geometrical progression, i.e. for some 0, 4 we 


have A, = 64, 
ie. P{U;<U;,,} = is cer = $+ 4 tanh (klog 0+ log ¢). 
Hence 40,6): P{(u)}=K, TL (0i-*p)s8ney-w) 
= Bie: sal (9) 


Assumption C thus contains Assumption A. We deduce that all three assumptions are 
equivalent. We shall take (9) to be our basic non-null model, #(4, ¢). 


6. PROPERTIES OF THE MODEL #(6, ¢) 
Certain properties of this model are immediate. 
(a) Let the null-hypothesis probability of a pair of values (R, S) be 


Pik ’ 8) _ (n!)-? F(R, 8), 











so 1 
the 


If t 


the 


thi 


en 


bi 


A=] 


r of 


OTS 


ons 


are 





C. L. MALLows 119 


so that there are just F(R, S) different rankings (uw) having R(w) = R, S(u) = S. Then for 
the present model we have 


H(0,9): Pog(R, 8) oc OP GS Py(R, 8). 
If the p.g.f. under the null hypothesis is 
Hy: M(x, y) => PR, 8) akys, 
R,S 


then the p.g.f. for the present model is 
(0,9): Mp, g(t, y) = Molx0, yd)|M(0, 4). (10) 
(6) The present model will be a special case of the one considered in § 4 if (from eqn. (3)) 
Tie = O*G'2, (l<i<it+k<n); 
this requires ¢ = 1, and then we may take without loss of generality 
m,=0% (l<t<n). 


Then PU; <U;} = 0 |(0 + 0-*) 


+ + 
and HO): P{(u)} = O22 %/| 624 | oc ORO, (11) 


As with #,, this model is unmanageable because of the permanents involved. 

(c) It is known (Daniels, 1944; Hoeffding, 1948) that under the null hypothesis, the joint 
distribution of R and S tends to the bivariate Normal form; i.e. that for any suitable} set 
& of points (R, S) containing a such points we have 

> BR, s)={{ f(R, S)dRds, 
(R, S)€f at 
where .* is a convex region of area 4a containing 7, but no points of of; the factor 4 
enters since AR = AS = 2. f(R,S) is the bivariate Normal distribution function with 
Hp=0, Gh=ger(n—1)(n+1)?, —— Amt+]1) 
Mg = 9, 0% = Ayn(n—1)(2n+5), - V{2n(2n+5)} ~~ 4n’ 
For the model #(6, 6) we have approximately (for 7 ‘small’) 
x= Bol, s)~orgs || f(R, S)dRdS, 
(R, S)exl wt 
whence asymptotically, under suitable approaches to the limit for 0 and ¢, R and S are 
bivariate Normal with 
é(R)~ log 0 log¢), varR~o%, 
(R) =~ op(TR log 6 + pag log $) iy oR, 8). 
6(8S)~o(poplogA+aglog¢), varS~o%, 


+ The usual proofs demonstrate only that the cumulative distribution tends to the Normal form; what 
is required here is that the ordinates, averaged over « points, tend to the Normal ordinates. I believe 
that the result for a = 1 has not been proved; Haden (1947) has proved the result for the marginal dis- 
tribution of S, but in view of the well-known erratic behaviour of RF the bivariate result is more difficult. 
I conjecture it to be true; however, for the present we only need to consider sets with 


a = O(n—oRCs) = O(n’). 





120 Non-null ranking models. I 


Conditions under which this is the correct limiting form are as follows. The asymptotic 
result in the null case may be written 


Ro So 
im | E P(R,S) / | | f(R,S)dRaS\ = 1, 
n—->o jee —2, —o 


where, as n> 00, 


Ri/op = O(1),  So/og = O(1), 
Ro/Tp—So/og = O(n-"). 
This last condition follows from the approach of p to 1. (It is possible that weaker con- 


ditions (e.g. Ry/op = O(n4)) may suffice; however, this is not known.) For the non-null 
asymptotic result to hold 


E(R)|on = O(1), &(S)og = O(1), 
é(R)/op—E(S)/o5 = O(n), 


whence Trplog@=O(1), oglog¢d = O(1), 
ie. log @ = O(n-3), log d = O(n-4). 
Putting limo,log@=yp', limoglog¢d = yp", 
we have lim 6(R)/op = hw’ +p" = lim &(S)/eg. 


Thus the two parameters are asymptotically indistinguishable; each of them merely 
shifts the bivariate distribution in the direction R/op = S/o;. We deduce further that R 
and S are asymptotically equivalent in the model W(0, ¢), i.e. they have asymptotically 
equal power for detecting a change in (7, log 0+ a, log d), at least whenever the above is 
the correct limiting form. 

(d) From the form of eqn. (9), R(w) and S(w) are respectively (and jointly) sufficient for 
6 and ¢; they will therefore provide most-efficient estimators whenever their asymptotic 
joint distribution is Normal. This brings out a distinction between the present model (which 
is a pure ranking model) and other models which have been suggested. Thus if a sample of 
n is drawn from a bivariate Normal population with correlation p, and the variate values 
are replaced by ranks, then Pitman’s asymptotic relative efficiency (A.R.z.) of R (or S) in 
estimating p (relative to the product-moment correlation) is known to be 97~* (Hotelling 
& Pabst, 1936; Moran, 1951). Lehmann (1953) considers a certain non-parametric system 
of alternatives to independence, and shows that R gives the optimum test for these alter- 
natives, for all sample sizes. For the Thurstone—-Babington Smith model, where a judge 
arrives at a ranking of n objects by first making an estimate of the ‘value’ of each object, 
these estimates being independently and Normally distributed about a regression depending 
on the true ranking of the objects (for simplicity the means for the different objects are 
usually taken to be in arithmetic progression), Stuart (1954) has shown that the a.R.£. of 
Ror S (relative to the product-moment regression coefficient) is 37—!. It is hoped in a further 
communication to investigate the relations between the present models and these alter- 
natives. 

7. THE MODEL #(¢) 


We shall now concentrate on the parameter ¢. Putting 0 = 1 in the basic model (9) gives 
the ‘¢-model’: 


Hg): P{(u)} = Kgs (12) 
and PU; <U;} = 1 = $i($+9>) (1<i<j<n). 





———— 





Wri 
null 


whe 
vari 


whe 
Hen 


For 


whe 
p-g- 
Thu 


geo! 


Put 


hen 


we | 


The 
con 


The 





\- 





C. L. MALtLows 121 


Writing now (w,,) for (wv) to denote the dependence on n, the p.g.f. of S,, = S(u,,) under the 
null hypothesis is known to be (see, for example; Moran, 1950) . 


M,(2) = > Bi{(u,)}}2 = T] : (z#-1 + 2k-3 + 4. z1-k), (13) 
(un) 1<k<n 


whence the random variable S, can be expressed as the sum of n independent random 


variables s, with p.g.f.’s oe ey 


kz-z1 ” ksinz’ 





m,(z) = 


where z = e**. Thus s, has a uniform distribution on the k points s, = k—1,k—3,...,1—k. 
Hence Moran (1950) obtains the cumulant generating function (c.g.f.) of s,, as 





/ a) (ix)?™ me 92m-1 vai 
log m,,(e*) =z Sm (—1)"— a (1—k2), (14) 


For the present model we have 
M,,,g(2) = M,,($2)/M, (9) 
Ce 
 ck<n G2—-Gz gk—g-®? 
whence S,, can again be expressed as the sum of n independent random variables s,, with 
gf’ 
eee, my, gl2) = mp(Gz)/m4(9)- 


Thus s, now has a distribution on the / points k—1,k—3,...,1—k with probabilities in 
geometric progression, proportional to ¢**. Also 


K;,* = M,(¢). (16) 
Putting log ¢ = 6 we have for the c.g.f. of s, 





(15) 





, iz) —] sinh (kia + ké) sinh 6. 
0g my, g(e™) = log Th (G+ 8)ainh ke 
hence, writing By = coth ké = (p* + 6-*) (gk -—g-*)-, 


we have Kix, = kB, — fy, 
Kx = (Bi—1)—k*(Bi— 1), 
Kon = 2K°B,( BE — 1) —28,(4i- 1), 
Ka~ = 2(A{— 1) (3A{— 1) — 2k4(BE— 1) (3BE— 1). 
These reduce to Moran’s values when 6-0. An extreme case is obtained when ko with ¢ 
constant + 1; f,,~ 1+ 2d-%* and we find 
Ky,4~k—f, + 2kd-*, 
Ko. ~ (AZ—1)—4k?6-*, 
11> — 2By(fi-1)4 = - (G+ 9, 
Y2> (3B — 1) (Bi— 1) = 2+ 77. 
The corresponding limiting distribution of s, is geometric; 


P{s, = k—-1—2j} = (1-¢*)¢-% (jf = 0,1,2,...). (17) 








122 _ Non-null ranking models. I 
Thus even in this case, the central limit theorem will apply; asymptotically, 

(S,, — 4n(m + 1) +f;) (n(Bi—1))4 
is a unit Normal variable. We notice that this is essentially a different limit from that con- 
sidered in § 6; it is not known whether the corresponding limits for R(w,,) and for the bivariate 
distribution are Normal, or what happens to the correlation between R,, and S,,. In inter- 


mediate cases, when log ¢-> 0 slower than n-?, we would expect the limiting distribution 


of S,, to be Normal, with &(S,,) = 0(n?), var S,, = o(n). 


8. DECOMPOSITION OF 8; 


In the null case, Feller (1945) has shown that the variables s, may be taken as 


&= YL sgn(u,—uU;); 
1<i<k-1 
this representation may be carried over to the ¢-model case. (This contradicts a remark 
of Moran’s (1950).) Let us go back to the original paired-comparison approach. Suppose 
the judge has decided the "-1C, preferences among U,, U;, ..., U,_, and has as yet no incon- 
sistencies. Then we have an (n — 1)-ranking (w,_,) say; and the p.g.f. of S(w,,_;) is M,_1 4(2). 
Now the judge makes the final set of m— 1 comparisons, U;< or > U, for i = 1, 2,...,n—1. 
These comparisons are independent amongst themselves and of the preferences already 


decided: P{U,<U,}=m (l<i<n-1). 


Of the 2”-! possible sets of outcomes for these comparisons, only these sets will be con- 
sistent which insert U,, into one of the n intervals between the n—1 U’s already ranked; 
the probability that U,, is ranked higher than the lower j — 1 U’s and Jower than the higher 
n—j U’sis ni-\l—m)yr-i (1<j<n) 
‘ 
independently of the ranking (w,,_,). 
This set of comparisons gives a ranking (w,,) with 

S(u,) = S(up-1) + 2) —-l-a= S(Un1) +8; 

say. The probability that the comparisons are consistent is thus 


> m(1—7)"— = (a —(1l—m)")(27-1)7=C,, say; 


1<j<n 


and we have as before that s,, is arandom variable independent of S(w,,_,) (and of (w,,_,)) with 


—A-l 
Poy} = Cy tmbe-tten (1 — yf toan = FE om (8, = %j—1—n; 1<j<n). (18) 





9. AN INVARIANCE PROPERTY 


We now prove for the ¢-model an important invariance property, namely, that 
py = P{u; <u; |”, ¢, comparisons are consistent} 
depends only on j—7 and ¢, and not on i or n. We prove that 


DY m1 = pP™ ~for l<i<i+tm—l<n. 





—V—__—_——e 


an 


wh 


er 








eee 


C. L. MatLtows 123 


This will be proved by induction on n. It is true by definition for n = m; assume that it has 
been proved for n = m,m-+1,...,a. Fix 7, with 1<i<a—m-+1. Consider the rankings (w,) 
with S(u,) = S and having the property (property L) that u;<4u,;,,_;;weput } todenote 


(ua) |L,S 
summation over these rankings. We shall write ((w,)j) for a ranking (u,,,) which has 


Ua+1 = j. We have by the inductive hypothesis 


Pim Pe -m—1 = x x P{(u,)}- 


S (uUq)|L,S 
Also PRaA=xt DT PMa=x = xX Pf((u,)J)}- 
S (wer DIL, 8 S 1<j<a+1 (uasDIL,8 


Ua+1>) 
Now S((u,)j) = S(u,)+2j-2—a, and ((u,)j) has the property L if and only if (u,) has; 


hence (a+) 


Phitni1=xr & x P{((u_)J)} 


S 1<j<a+1 (ua)|L,S 


“Ken E "ES OF 


1<j<a+l1 S (ug)|L,S 


gpott sli ¢- 
= Koy a ST = a Kez pin) = ph. 
This proves the result for all 1 except i = (a+ 1)—m +1; this case can be proved similarly 


by considering rankings (w,,,) = (j(u,)). Hence by induction on a we have finally p¥?, , = p, 
(1<i<i+k<n), where the {p,} depend only on ¢. 


We may obtain the {p,} explicitly as follows: we write (i(u;_,)j) for a ranking (u,) which 
has u, = i, u, = j. Then 


S(t(Uz_2)j) = S(up_2) + 2(j — 7) + sgn (¢—J) 
and Pra = PR = >> xX P{(i(uz_2)J)} 


1<i<j<k (ux—2) 


=K, dD ¢¥ y gu») 


1<i<j<k (ux—s) 
= K,(g**1 — kp + (k—-1) 9) (@— $1) * Kets, 
whence for any i (l<i<i+k<n) 
Ve = E(sgn (U;.4—%)) = 2p, —-1 


k+1 —k-1 k —k 
= (b+) Ge - oe = (k+1) fyi — kp, (19) 





in the notation of § 7. 
From this result we have 


é(S,)=F SY sgn (u—-u4)= LY Yyi= YD (n—-k)y, 


1<i<j<n 1<i<j<n 1<k<n-1 
E(S,) -—E(S,1) = XS Yn = hhh, 
1<k<n-1 


in agreement with §7. Similarly we may obtain &(R,,): 


é(R,) = € p>) (j—t)sgn(u;—u)= YL (j-t)y4 
<i<j<n 1<i<j<n 


= x Kn-k)y= YX k2k-n-1)f, 


1<k<n-1 1<k<n 


= x (2k—n—1)(kf,—A,)- (20) 


1<k<n 








124 Non-null ranking models. I 


10. INTERPRETATION OF THE PARAMETERS 


Having now considered some of the consequences of the Assumptions of § 5, we can give 
interpretations of the two parameters 0 and ¢. Each of them corresponds to a certain type 
of departure from the null hypothesis, that the judge cannot detect the real differences 
between the objects: 

(a) The model #(0), by its association with the Bradley-Terry model, may be considered 
to be assigning a weight w(t) = 6? to each of the objects U; to be ranked; this weight remains 
constant throughout the process of making the paired-comparisons and arriving at the 
ranking. The paired-comparisons are made according to the rule 


r ee, 
PLU, < Uj} = wi)+w(j) 6% +0% — Ok 49-k’ 








where k = j—i for 1 <i<j<vn. Thus this probability depends on the difference between the 
a priort ranks of the objects, but not on their absolute positions. While this seems an 
attractive hypothesis, it must be remembered that the rejection of inconsistent sets of 
comparisons introduces a distortion, the effect of which it is difficult to assess. 


Table 1. #(): Values of p, = P{u; < u;,;} 

















| 

a 1-2 | 1-5 | 2-0 

- i 0-54 | 0-6 | 0-6 
1 0-5455 | 0-6000 | 0-6667 
2 0-5754 0-6632 0-7619 
3 0-6049 | 0-7215 | 0-8386 
4 | 0-6337 | 0-7737 | 0-8946 
5 | 0-6617 0-8191 | 0-9339 
6 0-6887 0-8577 0-9599 
7 | 0-7145 0-8897 | 0-9763 
3 | 0-7392 | 0-9155 | 0-9862 
9 | 0-7626 | 0-9361 | 0-9922 

| | 
(b) The model #(¢) puts for the paired-comparisons 
PU,<U} = 0 = 5, (l<i<j<n), (21) 


this probability being independent of both i and j. This may not seem as reasonable an 
assumption as that for the 6-model above; however, the quantities we are really interested 
in are the p,,; of eqn. (2). 

Table 1 gives the values of 


Dy = Pluz,<uj,} (l<i<it+k<n) 
(from (19)) for three values of 4?, namely, 1-2, 1-5 and 2-0 corresponding to 
m = +6 = 0-54, 0-6, 2. 


It was proved in §9 that p,, is independent of 7 and n. 





————— 


a0°0 


he 


of 


1) 


an 
ed 





———— 


C. L. MaLtLows 125 


It will be seen that these values progress in a very reasonable manner—the further apart 
in the true ranking two objects are, the more probably will the judge put them in the 
correct order. 

From eqn. (18) we have a further interpretation of the parameter ¢. Given that the 
objects U,, U,,...,U,_, have been compared and a ranking obtained, the judge now com- 
pares U, with each of these; each comparison being independent of the rest, and according 
to (21). Now the restriction that only consistent comparisons are accepted means (eqn. (18)) 
that U,, takes the jth position in the ranking with probability proportional to $7’, i.e. (if 
¢ > 1) the probability is largest that U,, take its correct position (j = ), and falls off geo- 
metrically as 7 decreases to 1. 

(c) The general model #(0, 4) (eqn. (9)) puts 


Ys 

P{U; < U;} = ere = $+4tanh (klog@+log ¢) 
for k =j7—1, 1<i<j<n. This is a simple monotonic function of klog 0+ log ¢; thus the 
effects of 0 and ¢ are in a sense additive. The term log ¢ gives a departure from the null 
hypothesis which is the same for all pairs (7, 7); the other term adds to this a departure 
depending linearly on k = j —7. This model would appear to be an attractive approximation 
to the general model “% (eqn. (1)). 


11. APPLICATIONS OF THE ¢-MODEL 
From the form of the model #(¢) (eqn. (12)) we have that S(w) is sufficient for 4; problems 
of estimation and of hypothesis-testing will therefore be most naturally expressed in terms 
of S. However, since R and S are asymptotically equivalent, any procedure using S may if 
desired be replaced (approximately) by one using R. 
We shall use the following notation: 


=log¢, & = &(S|#(d)), Vy = var(S| #(¢)), 
A = 6,5}, Y = var (S| 4) = pg n(n— 1) (2n+5), 





which though not entirely consistent, will not lead to confusion. 
Approximations. From Moran’s expression (14) we have an expansion of log M,(¢) 
(eqn. (13)) in powers of 6: 








-* Sas 4 
log My(#) = F 3 (#1) — 3B (4-1) +. 
= 2 Un—1) 215 2 4 
= 1%6  —) ) én + 21n?+31n+31)d4+ 
whence from (15) 
_ 2 M,(¢2) m0) (6934 21m? 
é,= iz (6) as age () = 1350 (6n3 + 21n? + 31n + 31) 63+ 





6? 6n* + 21n?+31n+31 i — 
25 n+5 ae 


d 
_ M ete M — 
V, a ? a6 log n(f) +o iB ~ log n(P) ‘‘: 
We notice the curious result 


d 
%= $556 








126 -  Non-null ranking models. I 
We obtain further 





27A2 (27 9936 ,,\ A2 ) 
= =a) ead ee amis 2) _—_ 
Np K(1 25 n (5 + soaas **) a3 Tae 
: 27 &% 
Hence Vi~ V, =k 


Table 2 gives values of the percentage error of the approximation V; for various values of 
n and ¢, with the corresponding values of A. It will be seen that the approximation is 
remarkably good even for small n, provided A is not too large; except for very small n, the 


approximation improves with increasing n if A is held constant (compare the three entries 
with A = 1-72 or 1-73). 


Table 2. #(¢): Percentage error of the approximation V,, 
with corresponding values of A= &,V5 + 


% error = 100(V,— Vy) Vz? 

















| ¢= 1-1 ¢? = 1-2 | ¢* = 1-5 ¢? = 2-0 
| 
| , eee ae ees | | 
| 9% error | A % error | A % error | A % error | A 
| | is =z | mee cose, en 
oe. +011 0-05 + 0-38 | 0-09 + 1-92 0-20 + 575 | 0-35 
bg +008 | O14 | +031 | 0-27 +131 | 0-61 + 315 | 1-07 | 
| 6 + 0-05 0-25 +0°17 | 0-49 +008 | 1-12 — 4-09 | 2-01 | 
8 + 0-02 0:39 | -002 | 0-75 — 2-35 1-73 —19:28 | 317 | 
| 10 — 0-02 0-54 | —0-30 1:04 | -663 | 2-44 
| 12 -0-07 | 0-70 -071 | 1:36 | | | 
14 —0-14 0-88 | —1-32 | 1-72 | | 
16 — 0-23 1-07 —216 | 211 | | 
18 — 0-34 1-28 | | 
20 — 0-49 1-50 | | | 
22 — 0-67 1-73 | 
24 | -090 | 1-97 | | 
2 | -1-18 2-23 | | | 
| | | 











Under #(¢), 8 is approximately Normally distributed with mean &, and variance V;,; 
for small departures from the null hypothesis, the variance is approximately V; or more 
crudely still, %. ; 

In the following it should be remembered that we are investigating the judges, and not 
the underlying ranking. This was given a priori; each judge makes repeated attempts to 
reproduce it; we are concerned with 

(a) estimating the degree to which a judge can reproduce the ranking, i.e. (assuming 
he operates according to the ¢-model) estimating his ¢; 

(b) comparing two attempts at ranking; 

(c) testing for differences between the attempts of several judges. 

Estimation. Suppose a judge has produced / different rankings of the same n objects; it 





isd 
fro’ 


wh 


wh 


wl 


to 











C. L. Mattows 127 


is desired to estimate his g. We shall do this by the method of maximum likelihood. We have 
from eqns. (12) and (16) 


l 
log P{(u)1, (W)a, ---» (w)} = log p = S(u),— Hog M$), 


whence the estimation equation is 


5 d 
S= $ 7g 08 MnP) | 9-3 = €3 = &(S|$); 


thus the model is fitted by its first moment. For small departures from 4%, we have 
approximately S~hlogé =f, iy 


We may give an approximate confidence interval for ¢; since under #(¢), S is approximately 
Normal with mean &, and variance V; /l, we have approximately 


P{S—A, (Vil) <E,<S+A, VV) | o}~ 1-2, 
where A, is the 100« % point of the unit Normal distribution; more crudely 
PEA, J (V5/l) <Mlog $< S+A, \(V5/l) | O} =~ 1-2, 


a, 27 82 
where Vg=h- 3B (22) 

Test for consistency between two judges. Suppose two judges have produced rankings 
(u,) and (#), according to the ¢-model, with respective parameters ¢, and ¢,. Since S(w), 
and S(u), are approximately Normally distributed with means V, log ¢, and Ylog ¢,, and 
with variances %, we may test whether ¢, = ¢, by referring 


(S(w); — S(u)2)// (2%) 


to Normal tables. An improved test is obtained by replacing Y, by M, from (22); i.e. we refer 


(stu), Stu) 2m 5, Sew). Ser)*| 


to Normal tavles. 

This test is asymptotically equivalent to the likelihood ratio test. 

Test for consistency between several judges. Suppose m judges have each produced / rankings 
with scores S;; (j = 1, 2,...,1; i = 1,2,...,m). We associate with each judge a value ¢; of 


the parameter ¢; then under the hypothesis ¢, = ¢, = ... = ¢,, the ‘between judges sum 
of squares’ ai 
1d (8,-8.7 


(with the usual notation for means) is approximately distributed as ¥,y?,_,, and the ‘within 
judges sum of squares’ a li 
x & (8;-8;.)? 

i=1j=1 

is approximately distributed as V,x?,y_1). Thus we may test for differences between judges 
and for inconsistency within judges. More accurate tests would replace V, by the estimated 
Vi, as before. 











128 Non-null ranking models. 1 


The approximate powers of the above tests are immediately obtainable from standard 
Normal theory. Thus, for example, the power of the test to detect a difference between ¢, 
and ¢, when a two-tailed test is used is approximately 


1—F(A,—4)+F(-A,—-2), 
where A, is as before the 100 °, Normal point, and 


ft = a/(4h) (log 4, — log ¢). 


12. THE CONDITIONAL EXPECTATION OF R 


We now obtain an expression giving the conditional expectation of R, given S; this is the 
same for the ¢-model as for the null case. Writing as in § 6 M(x, y) for the null-hypothesis 


p.g.f. of R and S, M,(x,y) = zy Fle, 8), 


the p.g.f. for the d-model is M(x, yp)/M(1, 6). Thus 
0 M(x,¢ 


~ 











FH) Se TiAl, Bonn " 
However, we have for the null hypothesis 
0 
By Mol®> ¥) lem = Ly Fo(S) F(R | 8), 
S 
and so from (20) and (23) 
& PFS) Folk | 8) = & (R) M,(1, ?) 
: ,P+G* gt+g" g'-g" 
= (n!)-1 —n—-I) lk of 
(n!) i, aia ge—-g* g-g1 idk, g—-¢" 
= (nl) (2k—n—1){(k-1) $44 (k—-3) pF 84... 4 (1B) GH 
2<k<n 


x TL (pt1+¢%34...4941-4. 


2<i<n 
it+k 


13. CoNCLUSION 


The ¢-model exhibits several features of simplicity; however, difficulty has been en- 
countered in attempting to evaluate such quantities as P{u; = j} and P{u; = j, uw, =}; 
knowledge of the latter quantity would enable the variance of R and covariance of R and S 
to be obtained; hence also the conditional variance of R, given S. 

It is hoped in a further communication to present: 

(a) an investigation of the result of § 12; 

(6) a comparison of the present distributions of R and S with those obtained from other 
models, e.g. sampling from a bivariate Normal distribution; 

(c) an extension of the method to cover those cases where the true ranking is not known 
@ priori. 





Tos 


whe 
the | 
Hov 
hav 
uy = 


rd 
py 


he 


23) 


ehi- 


DB; 
is 


her 





C. L. MaLttows 129 


APPENDIX I 
Demonstration that Kendall's t,, is not essentially a ‘generalized matching coefficient’ 
To satisfy eqn. (6) with C(u) = S(u) (the score for Kendall’s t,), we could take (for n <9) 


i; = 7.10' 


when A(u) is simply the decimal number with digits u,; e.g. A(3, 5, 2,4, 1) = 35,241. We then define 
the function F(x) suitably at the n! different values of x (e.g. between 12,345 and 54,321) and have (6). 
However, no matter what {«;;} are, we cannot satisfy the inverse relation (7) for all rankings (uw) without 
having some of the values F-1(S) = gg equal for d:. “erent S. Thus with n = 4, consider the rankings with 
u,=1: 


(u) S(u) F-(S(u)) A(u) 
(172, 3, 4) 6 Ie = Oy + gq + Ugg + Xyy 
(1, 2, 4, 3) 4 Ia = X11 + Ago + hgq + yg 
(1, 3, 2, 4) + Is = X41 + eg + Aga + hag 
(1, 3, 4, 2) 2 92 = Oy, + Xqg + gq + Xqo 
(1, 4, 2, 3) 2 92 = O41 + Xoq + Ago + Xqg 
(1, 4, 3, 2) 0 Io = yy + Ay4 + Ogg + Oye 


Hence for any {«;;} we must have 
Io— 294+ 292—Go = 9. 


There ave three similar equations obtainable from the rankings with u, = 2, 3, 4 in turn; together, they 
imply that g, = g_,. A fortiori, (7) is impossible for n> 4. 


APPENDIX II 


To prove that Assumption B contains Assumption C we shali exhibit various pairs of inverse rankings. 
We shall use the ‘cycle’ notation; thus with n = 6, the cycle (1, 5, 6,2) (or (5,6, 2,1) or (6, 2, 1,5) or 
(2, 1, 5, 6)) will represent the ranking 


uy = 5, u,=6, u,=2, u,=—1; u=3, w= 4; 


ie. the ranking (5, 1, 3, 4, 6, 2). The inverse ranking (2, 6, 3, 4, 1, 5) is represented by the cycle (2, 6, 5, 1), 
which is the reflexion of the first cycle. Thus any cycle whose reflexion is not equivalent to itself will 
specify a pair of inverse rankings. From the following cycles we now obtain as in § 5 the required relations 
between the ;,,’s 


Cycle Relation 
To prove C(i): 
(1, 2, 3) Ri Aes 
(2, 3, 4) Nes = Ags 
ete. Hence Ajj, =A, (1<i<n-1) 
(1, 2, 3, 4) AigAis = AcaAne i.e. Ayg=HAgy 
(2, 3, 4, 5) AssAos = Ags Aas ie.  Agy = Ags ; 
ete. Hence Aji42=A, (1<ti<n—2) 
(1, 2, 3, 4, 5) AyeAisArs = Avs As5 Aus ie. Ayy = Ags : 
ete. Hence A,;43;=A3; (1<i<n-—3) 
and so on. Hence Ajy4,=A, (L<i<itk<n) 


To prove C (ii): 


(1, 3, 4, 2) AisAesAos = ArgArsAsa ie. AAs = AZ 
(1, 4, 5, 2) AA, = AB 
(1, 5, 6, 2) AA, = As 
ete. Hence Axz_,Agy, = AR (2k <n—-1) 


9 Biom. 44 








130 Non-null ranking models. I 


REFERENCES 


Barton, D. E. & Davin, F. N. (1956). ‘Spearman’s Rho’ and the matching problem. Brit. J. Statist. 
Psychol. 9, 69-73. 

BrapDitey, R. A. & Terry, M. A. (1952). Rank analysis of incomplete block designs. I. Biometrika, 
39, 324-45. 

Danrets, H. E. (1944). The relation between measures of correlation in the universe of sample per- 
mutations. Biometrika, 33, 129-35. 

Dantets, H. E. (1950). Rank correlation and population models. J. R. Statist. Soc. B, 12, 171-81. 

FELLER, W. (1945). The fundamental limit theorems in probability. Bull. Amer. Math. Soc. 51, 800-32. 

Haven, H. G. (1947). A note on the distribution of the different orderings of n objects. Proc. Camb. 
Phil. Soc. 43, 1-9. 

Hoerrpine, W. (1948). A class of statistics with asymptotically Normal distribution. Ann. Math. 
Statist. 19, 293-325. 

Horetuinc, H. & Passt, M. R. (1936). Rank correlation and tests of significance involving no 
assumption of Normality. Ann. Math. Statist. 7, 29-43. 

KeEnpDaALL, M. G. (1950). Discussion on symposium on ranking methods. J. R. Statist. Soc. B, 12, 189. 

KENDALL, M. G. (1955). Rank Correlation Methods. London: Charles Griffin and Co. Ltd. 

LEHMANN, E. L. (1953). The power of rank tests. Ann. Math. Statist. 24, 23-43. 

Moran, P. A. P. (1950). Recent developments in ranking theory. J. R. Statist. Soc. B, 12, 153-62. 

Moran, P. A. P. (1951). Partial «nd multiple rank correlation. Biometrika, 38, 26-32. 

Smiru, B. BasrneTon (1950). Discussion on Professor Ross’s paper. J. R. Statist. Soc. B, 12, 54. 

Stuart, A. (1954). The correlation between variate-values and ranks in samples from a continuous 
distribution. Brit. J. Statist. Psychol. 7, 37-44. 





10us 





[ 131 ] 


THE GENERALIZATION OF PROBIT ANALYSIS TO THE 
CASE OF MULTIPLE RESPONSES 


By J. AITCHISON anp 8. D. SILVEY 


University of Glasgow 


1. INTRODUCTION 


The generalized probit analysis which we will discuss in this paper arose from a problem 
in entomology. We will suggest a solution to this particular problem and then consider a 
more gereral situation where this method of solution might be appropriate. 

The particular problem is as follows. In the course of its lifetime the Petrobius Leach 
(Thysanura, Machilidae), which is a primitive wingless insect allied to the domestic silver- 
fish, passes through various stages, technically referred to as ‘instars’. A problem of 
biological interest is to estimate the mean time spent by such insects in each instar (stage). 
Since there are difficulties involved in keeping these insects alive in the laboratory (and 
consequently in keeping them under effectively continuous observation during their life- 
time), they were sampled at various dates in the field, and those observed were classified 
according to instar. The actual set of data to be analysed is given in Table 1.* The insects 
started hatching on 30 April and times will be measured from this date. The problem is to 
estimate from these data the mean time spent in each stage. 


Table 1. Numbers of insects observed in various stages 














Stage | 
Date | Total | 
| 
1 2 | 3 | ae eae ee | 
| 
Wipes a Pie Teo. Wal Bs ert ee fei ieoe se tele oe ee oe oe BES 
| | | 
29 May 31 2 | 0 ih ae | 0 33 | 
3 June 78 21 0 * 48 0 99 | 
14 June 18 90 a Oe ae | 0 | 150 
26 June | 0 31 638 | 23 | | Oo | 118 
| 19 July | 0 0 ee fee | A a 21 | 100 
| | | 








2. Basic MATHEMATICAL MODEL 


For the purposes of a mathematical model of the situation outlined in the introduction we 
suppose that there are s+1 stages in the development of an insect, and that an insect 
observed is necessarily in one of these stages. The final (s + 1)th stage is eventually reached 
by all insects and there is no question of estimating any unknown parameters for this stage. 
We also suppose that observations are made at m different times denoted by 2, 
(a = 1,2, ...,m). 

* For entomological details and sampling technique see Delany (1957). Stage 1 in Table 1 consists 


of hatching time together with three natural stages, since there seemed to be difficulty in distin- 
guishing among these. 


9-2 








132 Probit analysis for multiple responses 


The time spent by an insect in the ith stage (i = 1, 2, ...,s) can be regarded as an observa- 
tion on a non-negative random variable £;, and our problem is to estimate the mean value 
A, of each £;. Also the total time spent by an insect in stages 1, 2, ...,r can be regarded as an 


observation on a random variable 7, = > é;. If uw, = E(7,) (r = 1, 2,...,8), then clearly we 
i=1 


can estimate A; (i = 1, 2,...,8) by estimating yw, (r = 1, 2, ...,s) and then taking differences 
of these estimates. 
If, for each r, the distribution function G, of 7, is continuous, so that 
Pr (y,<«) = Pr(y,<«) = G,(x), 


then the probability 7,(x) that at time x an insect chosen at random will be in the ith stage 
is given by 


m(x) = 1—G,(x), 
1,(x) = G,_,(x)-—G(x)  (t = 2,3,...,8), 
y41(€) = G(x). 
The expressions given for 7,(x) and 7,,,(z) are obvious. For 7 = 2,3, ...,8, we have, in the 


usual notation, nat) = Pr(n_, <2,9¢>2) 
= Pr (;_,< x) —Pr(y;_4< 2,9; <x) 
= Pr (9,1 <x)—Pr (9; <2), 


since the distribution functions of the 7;’s are continuous and since 9; <« implies 4;_, <=. 
These expressions for 7,(x) (i = 1, 2,...,8+1) enable us to express the likelihood of the 
observed data explicitly in terms of the distribution functions of the 7,’s. 

We now assume that, for i = 1, 2,...,s, the distribution of the random variable &; is 
approximately normal with standard deviation o;, small relative to A;. Then 7, also is 


approximately normally distributed and we may take G(x) = O(z,) = "a5. e-¥ dt, 
—b 


is 
where 2, = = 7 and 6? = var (7,) = var ( > é,). This assumption completes the basic 
r i=1 





model. 

We note that the assumption that the standard deviation of £; is small relative to its 
mean is a necessary adjunct of the assumption of approximate normality of £;. It is an essen- 
tial part of our model that £; be a non-negative random variable. It can be non-negative 
and approximately normal only when its standard deviation is small relative to its mean, 
say, 30; <A;. It must be emphasized that in other applications this condition may not hold. 
Then it would be necessary to assume some other form for the distributions of the ;’s, 
which would make these effectively non-negative random variables. 


3. TEST OF THIS MODEL 


In the above model, the only real assumption made is that of approximate normality of the 


£,’s. At this point a very easily applied test of this assumption is available. For subject to 
the assumption we have 


and g0 = Sa ( z n,(t)). 








set 


age 


the 


<2: 
the 
ade 
5% 18 
oO is 


? dt, 


asic 


) its 
sen- 
tive 
pan, 
old. 
g's, 


the 
t to 








J. AIrcHison AND S. D. SinvEy 133 
Now if n,; denotes the number of insects observed in stage 7 at time x,, N, is the total number 


Tr r 
observed at time x,,and N,p,; = n,;, then p,, estimates 7,(z,)and > p,,; estimates  7,(x,). 
i=1 i=1 


Hence if Y,,. = o- te / , the equivalent normal deviate of S Pi then, for fixed r, the 
i=1 i=1 


points (2,, Y,,,) (« = 1, 2, ...,m) will be well fitted by a straight line. By plotting these points 
and considering for each r their nearness to linearity we can decide whether or not the 
assumption of normality is justified and the model adequate. 








Key 
ts 4h r_ | Symbol 
ov 
8 1 e 
A 2 x 
wv 
= 3 r 3 © 
E + 4 + 
2 5 a 
2 2- ° 
$ e 
=] 
tri IF @ ° i} 
x 
0 - 1 1 4 4 J r ! 1 
10 20 30 40 50 60 70 80 9 § Time x 
—_— > 
x 

1 § . + 

Fr © 

3h 





= 
Fig. 1. Equivalent normal deviate of >) p,,; plotted against time. 
i=1 


The diagram for the above set of data is given in Fig. 1. In considering this diagram it 
‘ 
seems reasonable to give little weight to those points based on values of 5 p,; near 0 or 1, 
i=1 


since small variations in numbers observed can cause considerable movement of such points. 
It will be seen that for those values of r for which more than one ‘reasonable’ point is 
available, these points lie very near straight lines. 


4. ESTIMATION 


We are now in a position to estimate the unknown parameters yu, (i = 1, 2,...,8) by the 
method of maximum likelihood. For now the likelihood of the observed set of data can be 
expressed explicitly in terms of the 2s unknown parameters jw; and 6; (i = 1, 2,...,8). 
While the 2s maximum-likelihood equations for their estimates are readily derived they 
are not simple and have to be solved numerically by an iterative procedure. 

We remark at this point that the solution of these equations is tantamount to the fitting 
of straight lines to each of the sets of points (x,, Y,,) (« = 1,2,...,m). Initial approxi- 
mations to the maximum-likelihood estimates can be obtained by fitting straight lines 
roughly to these sets of points. For given r a straight line fitted thus to the points (w,, Y,,) 


a? ar 








134 _Probit analysis for multiple responses 


(« = 1,2,...,m) will cross the x-axis near the maximum-likelihood estimate of ,, while the 
gradient will approximate to the maximum-likelihood estimate of — 071. 

However, we omit further details of this case since these are similar te those given in 
§6, where the procedure of obtaining estimates using a modified model is demonstrated 
for the data of §1. 


5. MODIFICATIONS OF THE BASIC MODEL 


It will be seen from the discussion in § 4 that the given data are really insufficient for good 
estimation of /1;, Mg, ..., 45, using the basic model, since of the five straight lines to be fitted 
only three are being fitted to ‘reasonable’ sets of points. This, together with other considera- 
tions based less on expediency, suggests modifications to the basic model which involve a 
reduction of the number of unknown parameters and which might prove useful in similar 
situations. 

The variance o? of the random variable £; describing the time spent in the ith stage has 
been regarded, up to this point, as a parameter independent of A,, the mean time spent in 
stage i. Further, no assumption has been made regarding the dependence or independence 
of the random variables ; (i = 1, 2,...,8). By making assumptions of this nature we can 
reduce the number of parameters to be estimated. 

There are two aims to be achieved in introducing such additional assumptions: first, to 
make them as natural as possible, and secondly, to ensure that from the point of view of 
computation, the estimation problem they give rise to is practicable. The ‘natural’ assump- 
tions make o? a simple function of the mean A;, whereas in order that the computation should 
not be too complicated, we would like to assume that 6?, the variance of 7,, is a simple func- 
tion of 4,, the mean of y,. Sometimes these aims are compatible; sometimes they are in 
conflict. To illustrate this we consider the following three ‘natural’ assumptions 


(i) of = 0°, 
(ii) o? = 07A,, 
(iii) o? = 0723, 
where in each case o® is a constant. If now we further assume independence of the £,’s, 
i.e. if we assume that the time spent by an insect in any one stage is independent of the time 
spent by it in any other stage, then, in terms of 0, and ,, these assumptions become 
(i) & = rot, 


(iy B= p,0%, 
Giiy’ 08 = [t+ 3 (me —maa?| 0% 


From both points of view the assumptions (i)’ and (ii)’ are ‘good’. However, while (iii)’ 
is @ very natural assumption it leads to a very unwieldy estimation problem. We would 
like to be able to replace (iii)’ by 02 = 20”. But this is an artificial assumption which would 
be consistent with the assumption (iii) only if we assume very high correlation between any 
two of the £,’s. 

The assumption which seemed to the authors most reasonable for the given set of data 
was the assumption (ii) in conjunction with independence of the &,’s, i.e. essentially the 





Su 
tir 


he 


iii)’ 
yuld 
yuld 
any 


lata 
the 





J. AITCHISON AND S. D. Stnvey 135 


assumption (ii)’. An initial test of whether such an assumption is valid is obtained by refer- 
ence to Fig. 1. For if the assumption is valid then a straight line fitted to the set of points 





(vz, Y,,), 7 fixed (« = 1,2, ...,m), should approximate to the line with equation Y = o > ; 
r 
From the point where this fitted line crosses the x-axis and from its gradient, an estimate of 
o may be obtained, without difficulty. If, as r varies, the different estimates of o thus 
obtained are in fairly close agreement and show no definite trend as r increases, then the 
additional assumption (ii)’ may be taken to be justified. This was found to be the case for 
the given set of data, if again not too much weight is attached to points based on values of 


¢ 
Y P,; near 0 or near 1. 
i=1 


Usually it will be possible to test an additional assumption of this nature in a similar 
fashion. 


6. APPLICATION TO GIVEN DATA 


Summarizing the model used to fit the given data we have: the probability 7,(x,) that at 
time x, an insect chosen at random will be in the ith stage is given by 


m(x,) = 1— o(% = £1) | 








oily 
7(%_) = oof) o(" 7) (¢ = 2,3, ...,8), 
Morale) = 07H), 
where ®(z) = Tas) | ie e-#" dt. Also the likelihood of the observed data is given by 
log L = k+ = 5 n,; log 7;(x,), 


where k is a constant and n,; is the number of insects observed in stage 7 at time 2,. 
We now have 
Om;(24) = _ OM 41(X,) oo JS. Tat hig 























—— “= Z,, (t=1,2,...,8), 
Of; Op; 2m, TN ™ his 
1 : Ly — fl 
Z = a i 24 i. ES se 
where 7 om)" and 2z,,; ea 
oO m . . 0 . 
Also omee = > I- Mak — ates | alte i= 1,2,...,8). 
Of; a=1\M(%_)  Mizi(%q)) Of; 
On; 1 ‘ 
Further Sale) = 5 eat Fai — %ai-1Zai-1] (1 = 2,3,...,8) 
m 8 1 f 
oo Fa=1i=1 LM (Xz) TMi+1(Xz) 


The problem of estimating the ,’s is thus reduced to the problem of solving a set of s+ 1 
equations each of which involves only functions which are either tabulated or easily 
calculated. (The only tabulated functions which it was found necessary to use were tables 
of the normal ordinate and integral and the computational procedure follows naturally 
from the form in which the formulae are given.) 








136 Probit analysis for multiple responses 


These equations were solved in the present instance by an iterative procedure which is 
essentially that used in probit analysis (see, for example, Finney, 1947). This involves 
(i) obtaining initial approximations to the roots of the equations, (ii) evaluating W, the 
information matrix with the unknown parameters involved replaced by these initial 
approximations, (iii) evaluating W~' and (iv) obtaining corrections to the initial approxi- 
mation by multiplying by W- the vector of partial derivatives of log L with respect to the 
parameters, calculated with these parameters replaced by the initial approximations. 

This method is a modified form of Newton’s method of solving numerically a set of simul- 
taneous equations and has the advantage over the latter method that it yields finally, 
without additional computation, an estimate of the variance matrix of the estimators. 
Also the matrix is more easily computed than that used in Newton’s method. 

We now discuss the points (i) and (ii) for our particular case. 

(i) We emphasize again that the solution of the maximum-likelihood equations is 
tantamount tothe fitting of straight lines to the sets of points (x,, Y,,), r fixed (# = 1, 2,...,m). 
Hence initial approximations to the roots of these equations can be obtained by fitting straight 
lines roughly to these sets of points. In order to reduce the number of iterations required 
in the process of solution it seems advisable to take considerable care with the fitting of these 
lines, and in so far as possible to attach weights to different points on the basis previously 
discussed. Moreover, scrutiny of this diagram can yield considerable information about 
how well we can expect different parameters to be estimated. It is obvious, for instance, 
that in the present case we can expect //;, /4, and ji, to be well estimated and the remaining 
H’s not so well estimated. Hence before any heavy computations are carried out the 
experimenter can decide whether the experiment he has performed is likely to yield as much 
information as he desires or whether it might be necessary, for example, to repeat the 
experiment with observations made at closer intervals. 

Fig. 2 shows the initial lines fitted in the present case. The lines resulting from the final 
solution of the maximum-likelihood equations are virtually indistinguishable from these. 
The corresponding initial and final estimates of the parameters are given in Table 2. Two 
iterations were required to produce the final solution of the maximum-likelihood equations. 
It will be seen that these produce little change in the initial approximations to the estimates 
of the ,’s, the parameters in which we are primarily interested. So graphical estimation 
of these parameters may be quite efficient. 


Table 2 
Parameter fy be bs Ma bs o 
Initial approximation 38-0 50-7 62-7 72-0 88-0 1-00 
Maximum.-likelihood estimate 38°3 50-9 62-5 71-9 87-9 1-01 


(ii) For the evaluation of W we have 


02] £ m l Orr. 2 
#( -— "6 ‘ = = +: | ie (i = 1,2,..48), 
oni a=1 LA (%,)  Meyrl(@ IL Om; 


v 


=— -— 4! (¢ = 1,2,...,8—1) 
a=17(X,) Of; OMiss eat a 


H( _ Plog L 


Cf; Cf; 


*) “a 
Cf; Cfis1 


a(— 208). § Md) Peale 


and 


)=0 if |i—j|>2. 





— —— 


Als 


and 


In 1 
tim 





J. AITCHISON AND S. D. Sinvey 137 

















0? log L ee xz - -d N, On,(x,) N, 077; 43(%,) 
nas x do” ) i Cay Pa a> oo ~ Hixa(@,) oo |parZa 
log L we te N, Om,(x,) N, OM; 43(X,) Om, (2) 6 \ 
we x ant * 2 = oo ” Wea ahe) ae OM, piitiadiinitdinien 


In the computation of these expectations it is necessary to calculate the 1/[7,(x,)]’s. Some- 
times 7,(%,) will be zero to the accuracy of the calculations. The terms in which such 


Equivalent normal deviate Y 






l a” ! 
60 70 80 ‘N90 
Time x 
_ Oo 
+ ‘4 


Fig. 2. Initial lines fitted to points (w,, Y 








ar) 


1/[7,(x,)]’s are involved will invariably be negligible and as a working rule we may take 
1/[7,(x,)] = 0 whenever 7,(x,) = 0. 

The information matrix takes a simple form in this case and advantage may be taken 
of the empty cells in it in the inversion process. The authors inverted it by successive 
bordering (Frazer, Duncan & Collar, 1947, p. 112). 

The matrix W- is an estimate of the variance matrix V of the maximum-likelihood 
estimators of the unknown parameters and W-! may be used without alteration in succes- 
sive iterations to obtain corrections to successive approximations to maximum-likelihood 











138 Probit analysis for multiple responses 
estimates. However, a better estimate of V may be obtained finally by computing W-, It 1 
the information matrix with unknown parameters replaced by their maximum-likelihood ass 
estimates, and using W;! as an estimate of V. For the given set of data we have 7 
rT 03080 0-0419 —0-0065 0-0000 —0-0530 —0-00507 8 
0-0419 0-3890 0-1156 0-0333 0-0161 0-0006 the 
w= —0-0065 0-1156 0°7835 0-2238 0-1521 0-0084 
~ | 00000 0:0333 . 0-2238 0-9988  0-2942  0-0016 |’ ” 
—0-0530 0-0161 0-1521 0-2942 2-2144 0-0239 
L—0-0050 0-0006 0-0084 0-0016 0-0239 0-0022 the 
while shc 
r 60-3090 - 0-0423 —0-0078 —0-0013 —0-0608  —0-0058 six 
0-0423 0-3742 0-1218 0-0361 0-0229 sia is | 











—0-0078 0-1218  0-7793 0-2308  0-1752  0-0088 
—0-0013 0-0361 02308 1-0123 03056  0-0022 
—0-0608 0-0229 01572 03056  2-2356 0-0249 
| —0:0058 0-0012 0-0088  0:0022 0-0249 00-0023, 








where in each case the elements of the main diagonal refer in order to (14, Mg, ..., 43 and o. 
In this case initial approximations happened to be so good that the differences between 
corresponding elements of W~! and W;' are very small. However, if initial approximations 
had not been so good there might have been considerable differences in the corresponding 
elements of these matrices and so Wy! should always be computed. 

As mentioned above, estimates A ; of the mean times spent in the different stages, ie. of | 
A, (i = 1, 2,...,5) are obtained by differencing estimates of the ,’s. Also an estimate of the 
variance matrix of the corresponding estimators of A; (¢ = 1, 2, ...,5) and o@ is obtained by 
forming the matrix V, , = AW;!A’, where 











7 Be... 
a oo d | to 
oo 0 0 0 0 to 
se 0 -i 1 0 0 O res 
o 0-1 100 | = 
0 Oo 0-110 2 
0 0 0 0 0 1f 
and A’ is the transpose of A. ‘ 

In the present instance we find that | x 
( 
A, = 38:3 r 0-309 —0-267 -—0-050 0-006 -—0-060 —0-0067 Bin. 
A, = 126 ~0-267 0-572 —0-202 —0-092 0-046 0-007 | wi 
A; = 11-6 y. | ~%050 -0-202 0-910 —0-463 -—0-060 0-008 | > 
A,= 94 “*7 | 0-006 -—0-092 -0-463 1-330 -—0-633 —0-007 | ill 

A; = 16-0 —0-:060 0-046 -—0-060 -—0-633 2-637 0-023 
é= 1-01 | —0:006 0-007 0-008 -—0-007 0-023 0-0034 th 











V1, 
ood 


do. 
een 
ions 
ling 


>. of 
‘the 
l by 











J. AITCHISON AND S. D. SInvey 139 


It will be seen that the estimates of the A’s and o are such that they do not invalidate our 
assumption that the standard deviation of each £; is small relative to its mean. 

If we assume that samples are large it is now possible to obtain, in the usual way, con- 
fidence intervals for the mean times spent in the different stages. Also, for example, if we 
had data on different batches of insects we could cleariy test whether the times spent in 
the same stage by different batches were significantly different from one another. 

We conclude this paragraph by comparing in Table 3 the expected number of insects 
in each stage at each time (when unknown parameters are replaced by the above estimates), 
with the corresponding observed number. If entries in this table where expectations are 
less than 5 are pooled with neighbouring entries at the same date in the obvious way (so 
that we have a total of twelve observed and expected numbers to compare) then a x°-test 
shows no significant differences between observed and expected numbers. Since a total of 
six parameters has been estimated the number of degrees of freedom associated with x? 
is 12—6 = 6. 


Table 3. Observed and (expected) numbers of insects 














Stage | 
Date 
1 | 2 | 3 4 | 5 | 6 
ihe: eet | wreweey! | | 
} | 

29May | 31(30°8) | 2 (22) | O (0) | O (0) 0 (0) 0 (0) 

3June | 78 (74:8) 21 (23-3) 0 (0-9) | 0 (0) 0 (0) 0 (0) 
14 June 18 (21-3) 90 (97-8) | 38(288) | 4 (2-0) 0 (01) | O (0) | 
26 June 0 (0-2) 31 (23-2) 63 (65-6) | 23 (241) | 1 (48) 0 (0-1) | 

19 July 0 (0) 0 (0) 2 (1-4) | 12(15-7) | 65 (62-6) 21 (20-3) 








7. GENERALIZATION OF PROBIT ANALYSIS 


The situation where the methods of quantal probit analysis are applicable is, in the cus- 
tomary terminology of this analysis, as follows. Random samples of subjects are subjected 
to various doses of a stimulus to each of which a subject may or may not respond. Any dose 
results in a dichotomy of the subjects to which it is applied into those responding and those 
not responding. A dose is said to be effective for a subject if it produces a response in that 
subject and the minimum effective dose for a subject is called its tolerance. One of the main 
objects of this type of analysis is to estimate mean tolerance (Finney, 1947). 

Clearly a situation might arise where in place of a simple dichotomy, subjects are divided 
into more than two classes by any dose of the stimulus. Accordingly we envisage an experi- 
ment where random samples of subjects are subjected to m doses x, (« = 1,2, ...,m) of 
a stimulus and as a result of the application of the dose x, each subject is placed in one of 
8+1 classes. A straightforward illustration of such an experiment is given by Tattersfield, 
Gimingham & Morris (1925) who classified insects subjected to a poison as unaffected, 
slightly affected, moribund or dead. The particular problem discussed above is another 
illustration if, in this case, time is regarded as the stimulus. 

We now ask what conditions must be satisfied in this general experiment in order that 
the method of analysis used in our particular case should be applicable. 











140 -Probit analysis for multiple responses 


Study of § 2 shows that the crucial conditions are 
(i) the classes must be ordered, mutually exclusive and exhaustive, 

(ii) the reactions of a subject to increasing doses must be systematic in the sense that if 
dose x places a subject in the ith class then a dose greater than x is required to place this 
subject in the jth class whenever j is greater than 7. 

When these conditions are satisfied then the model of §2 can be used to describe the 
experimental situation in the following manner. If for a subject chosen at random the 
tolerance of this subject for the ith class is y; (i = 1, 2,...,8) (ie. y; is the minimum dose 
required to place this subject in the (i + 1)th class) and if every subject is placed in the first 
class by dose y, = 0, then y;—y;_, can be regarded as an observation on a non-negative 
random variable ; (i = 1, 2, ...,8), which describes the marginal tolerances of the subjects 
for the ith class (i.e. the differences in tolerances for the ith and (i — 1)th classes). The mean 
value A; of £; is then the mean marginal tolerance for the ith class. Further, the random 


‘ 
variable y, = > &; describes the tolerances of subjects for the rth class, and ~, = E(7,) is 
i=1 


the mean tolerance for the rth class. Methods similar to that given subsequently to § 2 may 
then be applied to estimate the j/;’s and/or A,’s, whichever happen to be of biological interest. 
We emphasize again that the assumption of normality of the £;’s which we have used in our 
illustration is possible only if it can be assumed that the standard deviation of each £; is 
small compared with its mean. 

We remark finally that if s = 1 then the present analysis becomes an ordinary probit 
analysis and it is in this sense that we have generalized probit analysis. 


REFERENCES 


Dewtany, M. J. (1957). The life history and ecology of the genus Petrobius Leach (Thysanura, 
Machilidae) (in the Press). 

Finney, D. J. (1947). Probit Analysis. Cambridge University Press. 

Frazer, R. A., Duncan, W. J. & Cotuar, A. R. (1947). Hlementary Matrices and Some Applications 
io Dynamics and Differential Equations. Cambridge University Press. 

TATTERSFIELD, F., Grmincuam, C. T. & Morris, H. M. (1925). Ann. Appl. Biol. 12, 61. 





——— 


| 
| 
| 








ira, 


ons 





0 a 





[ 141 ] 


EXPERIMENTING WITH ORGANISMS AS BLOCKS 


By 8. C. PEARCE 
East Malling Research Station 


An experimenter may usually assume that his treatments will have little effect outside the 
plotst to which each is applied, but sometimes this is not justified. Thus, if an animal is 
injected at several places with different strains of inoculum it is conceivable that a strain 
that produces a large lesion locally will aggravate lesions elsewhere. Again, a tree may have 
each branch pruned according to a different method, and it is possible that a method which 
encourages cropping on the branch to which it is applied will encourage cropping elsewhere 
on the tree, though to a smallerdegree. Nor are remote effects always of the same sign as the 
local ones. Thus. if the blossoms of a plant are grouped and each group is pollinated with a 
different variety, it may well be that a kind of pollen which induces high fruit-set locally 
may thereby inhibit fruiting elsewhere. 

The existence and importance of such remote effects is often problematical, and may 
require to be investigated. Experience with the method to be described has shown that 
there are sometimes remote effects of considerable biological interest. On the other hand, 
it may appear that none exist, and simpler methods can then be used with confidence in 
the future. 


Notation 


The local effect «f a treatment on the plots to which it is applied will be represented by the 
parameter y;, where j is the number of the treatment. Its remote effect on the other plots 
of the same block will be written 4;. It will be assumed that it has no effects in other blocks, 
which will consist of different organisms. It will also be assumed that the remote effect is 
the same on all other plots of the block; this is a biological problem, but one that is fairly 
easy todecide, because the answer can be found in the physiological symmetry or asymmetry 
of the organism. 

There is also the possibility of local and remote effects interacting. Thus, although it 
has been assumed that the remote effect of treatment j on a given plot does not depend upon 
the plot itself, it might depend upon the treatment that is being applied locally. The para- 
meter ¢;,,;, will be taken as the additional effect of a plot receiving treatment k when 
treatment j is being applied elsewhere in the block. 

There are instances in which this concept of local and remote effects may not appeal to 
the biologist. Thus, in the studies of the effects of different kinds of tuberculin described by 
Paterson & Leech (1954), the concept was rather of a systemic effect over the whole 
guinea-pig irrespective of the site of injection, and of a local effect at the site. Since, however, 
the systemic effect equals 6; in the notation of this paper, and the local effect as now 
defined equals (y;—6;), little modification of the working is required. 

The general parameter will be written as a, and the parameter for block h as £,,. 


+ The word ‘plot’ is used because it is the usual statistical term for the unit to which experimental 
treatments are applied. However, for triais of the kind to be considered these units will be very dis- 
similar from ‘plots’ in an agricultural sense. 





142 Experimenting with organisms as blocks 


The following equations of constraint will be used: 


0 = DAa = DY; = UG; = Udpy = UT dyy- 
h j j k 


THE CHOICE OF EXPERIMENTAL DESIGNS 


If an orthogonal design, e.g. randomized blocks, is used it is apparent that there is no means 
of separating the parameters of the remote effects, the evaluation of which is important. 
Thus, within a block all remote effects occur equally often and will be lost by virtue of the 
equation of constraint involving them. Nor will the parameters y; and 6; be separable in the 
expressions for treatment totals; thus, if injection of a branch of a tree with a nutrient solu- 
tion increases the crop as compared with the other branches by 30]b., it is not possible to 
say whether there has been a local effect of + 30 and no remote effect, or a local effect of 
+50 and a remote effect of + 20, or what. For all the figures prove there might have been 
a local effect of — 30 and a remote effect of — 60. 

If, however, balanced incomplete block designs are used, it is apparent that treatment 
comparisons within blocks will provide estimates of (y;—4;) for all j, as before; whereas 
comparisons between blosks will give estimates of [y,; + (k—1)4;], where k is the number of 
plots to a block. Provided that the inter-block differences are determined with reasonable 
accuracy, it is thus possible to estimate y; and 6; separately to a useful degree of precision. 
The design proposed is, in effect, one of 6 kinds of block, with v treatments, & plots to a 
block, and n of each kind of block so that there are nb blocks in all. There is no necessity 
for n to exceed one, but in practice it is usually desirable if the inter-block comparisons are 
to be of use. 

Where it is required to take account of the interactions between local and remote effects, 
choice must be restricted to a narrower range of designs. Thus if the normal equations are 
to be manageable the design must be unreduced,f i.e. all combinations of the v treatments 
i) . Also, if there are 
to be any degrees of freedom for error and if k = v—1, m must be at least two. Also, it will 
appear that k must be at least three, which means that v must be at least four. 


taken k at a time must occur equally often in blocks, so 6 must equal ( 


ANALYSIS OF DATA 
(i) With interactions ignored 


This is not difficult. Let the grand total of all data be G, let the total of data from treatment 
j be T;, and let the total from all blocks that do not contain treatment j be U;. 


+ It has been pointed out to the writer by Mr G. H. Freeman that this restriction to unreduced 
designs is sufficient to ensure tha no normal equation shall contain more than two ¢ parameters 
(i.e. Jy; and ¢,,), but it is not necessary. The necessary restriction is this: Considering only blocks that 
contain treatment 7 but not treatment j, and omitting from them all plots to receive treatment i, 
then, for all ¢ and j, the remaining design must be in balanced incomplete blocks. Let the parameters 
of the remaining design be given the conventional symbols, b, k, r, v and A, and let the corresponding 
capitals refer to the original design, then 

B=b(v+1) (v+2)v—k+1) (k+1), K=k+1, R=dw+1)/(v-—k+)), 
V=v+2 and A=bk/(v—k+1). 
The simplest integral solution of these equations, apart from those arising from unreduced designs, 
appears to be b= 18, k= 4, r=8, v= 9, A= 3, B= 66, K=5, R= 30, V=11 and A=12. It is not 
known if such designs exist, but if they do they are too large to be of practical importance, and their 


possible existence does not justify complicating the algebra. Nevertheless, Mr Freeman’s comment 
is of some theoretical interest. 





Estiw 


Hence 


and € 


Th 
meat 


and 


then 


ex] 





S. C. PEARCE 143 


Estimating treatment effects within blocks gives (y;—4;) equal to 


_(v-1) 
wee 1) &N+ 5-4). 


Estimating them between blocks gives = +(k—1)6;] as 


a ne k) G—vU,]. 


she 
Hence, estimate of y; equals 
(v—1)_ 
sbio— bm -)G-Ud, 
and estimate of 5; equals 
(v—1) _ ‘ 
nbk(k —1)(v— ‘ae k)G—(v—1)U;—(v—k) J). 


The standard errors of these estimates are likewise readily obtained. If M be the error 
mean square of the analysis between blocks and M’ that of the analysis within, and writing 


Mo = M+ ak b) yp 


(v—&) ,,, 
ace i as 5 
and M M+ — v(k—1) M’, 
then [standard error of estimate of y,]* = - — si P 
j nbk?(v — k)~ 
_2v(v—1) 
Ss = esis 
[standard error of estimate of (y;—;)] nbk2(v —k) 


The corresponding expressions for the standard errors of 6; and (3; — 6;) are the same except 
for the replacement of M° by M+. If k equals two, each treatment gives rise to the para- 
meter for its remote effect as often as to the one for its local effect, and M°® equals M+. 
As k increases from two, M+ becomes progressively smaller than M/° and the remote effects 
are determined with ever greater accuracy as compared with the local effects. This is just 
as well, because the remote effects are likely to be the smaller and so accordingly are differ- 
ences between them. 

If the treatments had been applied to whole organisms, estimates would have been 
obtained of the quantities [« + y,;+(k—1)4;]. These values can also be obtained by adding 
their components; if this is done, the (standard error)? of the difference between two such 
estimates is 2x(v—1) M 


nb(v—k) ° 


It should be noted that just as the parameter y; has been adjusted to (y;—4;) in the 
analysis within blocks, so also has /,, been adjusted to fF; = £,—26;, summation taking 


place over j for treatments lacking in block h. 


(ii) With interactions allowed for 


Where interactions are considered the equations already given still apply, because the 
expressions for 7;, U; and G will contain no parameters of the type ¢,,. It follows that if 





144 Experimenting with organisms as blocks 


¢ is to be evaluated use will have to be made of further quantities; the following will be 
found convenient: 

Dy, the total for treatment j in blocks from which treatment ¢ is absent. 

Ey, (= £,;), the total for blocks containing neither treatment j nor t. 

The total for blocks not containing treatment t; this is of course U,. 

Thus, D,,, Hy, and U, are respectively analogues of 7;, U; and G. It will be convenient to 


write Fy =kDy+£y,—-U, and V,=kT,+U,;-G. 


J 


The parametric expression for F, is 
v—3\[(ve—k-1) (kv —2k—v+1) ’ 
n . >) | (3) <= a Pn + (v—1)(¥;-9))+(%—- 9) | - 
Since, as has been shown, the corresponding expression for JV; is 
nbk(k — 1) 
(v—1) (y;-9;), 
it follows that 


(v—1)(v—2) (v—k) [(kK-1) 5+ VJ —k(v—k) [Ve +V) 
— v(v—1) (v—2) [(k-1) Fy + By] + ko(v — 2) (Fy + Fi] 
=2 
is an estimate of nk(k — 2) (v—k) v(v—1) “ ce >) Pyt- 
This expression can be set out in other ways, but it does not appear that any of the alter- 


natives is more convenient from the point of view of practical computation, except when 
k = v—1, when it is easier to take 


(v—1)V,+Y—v(v—2) F, 
as an estimate of nv(v — 1) (v— 2) d,,. It may be noted that in another special case, i.e. where 


h— 9? 
mee, (k=1) += V4 Vj= yt By = (k-1) Fy + By 


J J 
and the expression equals zero, as does the coefficient of ¢,,. It is for this reason that the 
condition k > 3 was made. 

If v—k = 1, a test of the interactions can very readily be made, for in this case the para- 
meters, $y, are orthogonal with the parameters, £7, as well as with (y;—4;). First, an 
analysis of variance is worked out ignoring interactions and taking account only of the 
parameters, a, /f and (y;—4;); let this give an error sum of squares, S’, with 


[nb(k —1)—(v—1)] 


degrees of freedom. It is now necessary to evaluate the interaction sum of squares, W, and 
this is best done by multiplying the estimated value of each interaction parameter, ¢,,, 
by the total of the data to which it applies, i.e. 


W= ¥ F(T; — Dy) gai DGD yt. 
t+j t+j 


The quantity S (= S’ — W) now becomes the error sum of squares. Since W has (v? — 3v + 1) 
degrees of freedom, S has [nb(k — 1) — v(v—2)]. A comparison of W and S by the F-test will 
show if there is any evidence of an interaction of local and remote effects. 








If v 


where 
denot 
matio 
by ad 


S'+Z 


Itisn 
it to k 


Ynj> tl 


it fol 


ie. k 
the s 
block 
rand 
least 

In 
writt 
Since 
appe 





“ 


I> 





S. C. PEARCE 145 
If v—k>2, fF needs to be adjusted to 


1 : ae 
= Br i Dd’ (Pint Gy) = BE- k Dd" (Pin + Py); 


where &’ denotes summation where both treatments j and ¢ are present in block h, and &” 
denotes similar summation when neither treatment j nor ¢ is present. The choice of sum- 
mation is a question of mre enEI It follows from this that 8’ will need to be adjusted 
by adding Z = % B,\ PX — P**), where B,, is the total of data from block h. S now equals 


S'+Z—W, the Seats of freedom for W and S being as before. 


VALIDITY OF THE TEST FOR INTERACTIONS 
It is not at once evident that the F-test for interactions is unbiased, but examination shows 


it to be so. Suppose that there is in fact no interaction, then the parametric expression for 
y,;, the figure given by treatment j in block h, may be written 


a+ Bit (¥;—9;) + pj, 
where 2,,; is a random residual. Let 7';, Uj, Dj, etc., be functions of the x’s corresponding 


to T;,U;, Dj, ete., as functions of they’s, hon Uj= Ej, = G’ = O0and Vj = kT and Pu = = kD}. 
Consequently, if 4; , be evaluated now, i.e. in the case where there are in Snot no iniccodtibinn, 

(v—1)(v—2) (v—k) ((k-1) 754+ 7))—k(v—k) [15 4+77] 

—v(v—1) (v— 2) [(k—1) Di, + Dj] + kv(v — 2) [Dj + Dy;] 
will be taken as the value of 
v—2 
nk — 2) (v—k) v(v—1) ® 3) Put- 

It remains to evaluate W assuming the null-hypothesis. Since ¢,, = $%,, it follows that 

W =-—>D $%,D,, Also, D,, equals D', plus a linear function of parameters that is determinate 
t+j 


for each combination of j and t. 
Let € represent expectation under random permutation of plots within blocks, and let 


€(x) = @. Since e(7; ) me e(T +) = e(D'y) e(D; ) —j 0, 


= (Dj; 
e(7", Diy) = e(D'2) =n()- ue 

(7; Dy) = (Dj, Di;) = 

k(v? —3v+1) 
(k-1) 

ie. kO/(k—1) for each degree of freedom, provided the null-hypothesis is true. But this is 

the same as = S’, which is the error sum of squares of a design in balanced incomplete 

blocks, and hence it must be the same for S, when this equals S’— W. Thus, by virtue of the 

randomization of treatments within blocks, the F-test between W and S is unbiased, at 

least when v—1 = k. 

In other cases, however, there is the adjustment, Z, to be considered. This has been 
written as a sum of terms of the form ¢,,B,,, where neither treatment j nor ¢ occurs in block h. 
Since ¢(B),7) = e(B,T;) = e(B,D}) = e(B),Di;) = 0, it follows that ¢(Z)=0. It thus 
appears that the F-test between W and S is valid for all the designs considered. 


it follows that e(W) = 


TO Biom. 44 





146 


Experimenting with organisms as blocks 


NUMERICAL EXAMPLE 


The data of this example are drawn from the first trial to be designed at East Malling using 
the above approach. As such, it is open to improvement, but is for that reason instructive. 

The experiment was intended to study metaxenia in apples. Metaxenia is the pheno- 
menon of pollen affecting the characteristics of the plant pollinated, and it was proposed 
to test whether it is in fact true that the size of Cox’s Orange Pippin apples depends in part 
on the pollen. In particular, there was a suggestion that pollination by King of the Pippins 
gives large fruit. 


Table 1. Data of experiment on xenia effects in apples 


Figures represent mean fruit diameters in millimetres 






































Kind of pollen, j 
Wave Block no., Values 
h | of B, 
1 | 2 3 4 
— SE eee ee pce re ee or 
First 1 si 18-71 17-01 17-23 52-95 
2 20-42 = 18-02 19-59 58-03 
3 21-39 20-85 oe 18-94 61-18 
4 19-49 18-48 17-22 aed 55-19 
Second 5 ted | 93-16 23-33 23-34 | 69-83 
6 24-67 | sti 24-54 | 25-00 74-21 
7 24-14 21-78 | - 20-86 66-78 
| 8 26-70 25-75 24-77 pa 77-22 
Values of T, 136-81 128-73 124-89 124-96 515-39= G 
U; 122-78 132-24 127-96 132-41 
V; +17-82 +3-04 — 12-76 —8-10 “an 








In view of the great labour of emasculating blossoms and then hand-pollinating them, it 
has previously been the practice to divide the blossom trusses into groups to serve as plots, 
using whole potted trees as blocks. At the suggestion of the writer, in this trial, conducted 
in 1952, a balanced incomplete block design was chosen. Pollen from four varieties was to 
be used (v = 4), namely, 1 = King of the Pippins, 2 = Ellison’s Orange, 3 = James Grieve 
and 4 = Worcester Pearmain. The blossoms of each tree were divided into three groups 
instead of four (k = 3), ana there were accordingly four kinds of blocks (b = 4) depending 
on the pollen omitted. Twelve trees were selected; four were brought on by heat treatment, 
four were allowed to develo, naturally, and four were retarded by cold, but these last did 
not appear wholly normal and were discarded. In each of the ‘waves’ that were used, one 
tree was allocated at random to make one of each kind of block, and the three pollens 
applied each to one group of blossoms selected at random. There were thus two blocks of 
each kind (n = 2). The grouping of blocks into waves was not considered above, but gives 
rise to only trivial modifications of the standard method; it was arranged in order to spread 
out the labour of applying treatments. Blocks were assigned numbers after randomization, 
and this accounts for the apparently systematic design. 

The figures in Table 1 represent average fruit diameter in millimetres over each group of 








248 





a 


to zeI 
out ir 


Since 
and / 
Fre 


ignor’ 


name 

To 
value 
vertic 

















S. C. PEARCE 147 


blossoms at 49 days after pollination, together with values of B,, 7;, U; and V;. As a check 
it may be noted that the B’s and the 7’s should sum to G, the U’s to (v—k)G, and the V’s 


Table 2. Values of (y;—4;) and 248% 
































j 1 2 3 4 
(y;—8)) 4 1-11375 + 0-19000 —0-79750 — 0-50625 
h 1 2 3 4 5 6 7 8 
248%  —8288 -49-63  -32-33 -—77-92 +5216 +7981 +1247 498-32 
Table 3. Values of Dy, Fy and 1444, 
| Dy j= | j=2 j=3 j=4 
| EO eens aes ee SS 2 
| 
) t=] — | 41-87 40-34 40:57 
2 45-09 ae 42-56 44-59 
| 3 45-53 42-63 — 39-80 
4 46-19 44-23 41-99 _ 
Fy, Cs s 2 3=3 j=4 
t=1 _ 42-83 —1-76 —1-07 
2 + 3-03 — — 4-56 41-53 
3 + 8-63 —0-07 — — 8-56 
4 + 6-16 + 0-28 — 6-44 ane 
: 
| 144¢,, j=1 j=2 j=3 j=4 
| 
tad - | + 12-90 —19-14 | + 624 
2 + 96-78 — + 3-72 — 100-50 
3 — 85-02 — 9-24 aes + 94-26 
4 ~ 11-76 — 3-66 + 15-42 et 














to zero. Now, (y;—4;) is estimated by (v—1) V;/{nbk(k—1)}, which leads to the values set 
out in Table 2. Further, B,, equals 
ka + kB¥ —[sum of (y;—4;) for treatments lacking in block h]. 

Since a = G/nbk, values of 24f; can be determined as shown in Table 2. Both (y;—4;) 
and £;* should sum to zero. 

From these values it is possible to work out the residual sum of squares within blocks, 
ignoring interactions. This equals 

& (data)? — G?/nbk — ¥ T;(y;— 65) — ~ B, Pi; 
J 

namely, 6-7350 with 13 degrees of freedom. 

To proceed now to the study of possible interactions, the first step is to write down the 
values of Dj, as is done in the top part of Table 3. These should sum horizontally to U, and 
vertically to (v—k) T;. The next values needed are those of Hy, but in this instance, where 


10-2 





148 Experimenting with organisms as blocks 


v = k+1, they are all zero and need not be considered further. In the general case they 
should sum horizontally to (v—k—1) U, and vertically to (v—k—1) U;. 

From a knowledge of U,, D;, and E,, it is easy to evaluate Fj; the results of doing so are 
shown in the middle part of Table 3, and sum horizontally to zero and vertically to (v—k) V;. 

From these values it is a straightforward matter to obtain estimates of the interaction 
parameters, ¢,,, though it is usually easier to estimate some multiple of them. In this 
example, to maintain generality the full expression has been used, and the figures in the 
bottom part of Table 3 are of 144¢,,; had the simplified expression for the case, v = k +1, 
been used, the multiple would have been 48. The figures should sum both ways to zero. 
Now, W =—- 2b nD ip which in this case comes to 3-5340. It has five degrees of freedom. 

t+ 


Since v = k+1, no adjustment, Z, is needed. 
This gives rise to the analysis of variance: 


Source D.F. S.s. M.S. 
Interactions 5 3°5340 0-7068 
Error (by difference) 8 3-2010 0-4001 
Total 13 67350 


This shows that the interaction effects are not significant; also there is no suggestive 
regularity in the table of 144¢,,, so it was decided to ignore possible interactions. 
It remains to evaluate y; and 6; separately. This is done quite easily: 
y, = +1-754, 38, = +0-640, 
Yo = —0-439, 8, = — 0-629, 
Y; = —0-384, 3, = +0-414, 
Y¥,=—0-931, 3, = —0-425. 
The values of y; and 4; should both sum to zero. The former series is of great interest, because 
the figures do indeed suggest that , ollen from King of the Pippins (j = 1) gives larger fruit 
than do the other pollens. The statistical significance can only be judged from a knowledge 


of M® and M*, which in turn depend upon M and M’. M’ may be evaluated from the analysis 
of variance just given as 0-4001; to find M it is necessary to consider the values B,,, thus: 


Type of block 





1 2 3 4 
Ist wave 52-95 58-03 61-18 55-19 
2nd wave 69-83 74-21 66-78 77-22 


Eliminating both the effects of type of block and of waves, and remembering that each figure is 
the sum of three data, these give an error mean square of 7-9216 with three degrees of freedom. 

Hence, M° = 8-1217 and M+ = 7-9716, and the standard errors of (y;—y,) and (3; —6,) 
are respectively 1-645 and 1-630. Degrees of freedom are very few, but even so the data 
provide some support for the belief that the kind of pollen affects size of fruit, especially 
as the effect observed was nominated beforehand. The values of 6; are not so different as to 
demonstrate the existence of remote effects. 


Discussion 


The usefulness of this method depends upon the questions to be answered. If remote effects 
are to be expected and if the aim is to assess the effect of treating whole plants or animals, 
nothing is to be gained by treating parts of them. In such a trial the quantities to be deter- 











minec 
ev nt 
thove 
it ther 
of usi 
Tf, : 
treati 
effect: 
as it ¢ 
the sa 
in the 
and if 
of bio 
The 
of an 
local « 
not in 
of the 
venie 
precis 
gatiny 
of the 
In 
impo! 
of fre 
was t 
able 1 
econc 
a late 
or an 
the e 





ve 


r- 








S. C. PEARCE 149 


mined are the values of «+y,;+(k—1)4; for the various j’s, and the accuracy will in any 
e. nt depend on M and not at all on M’. Further, if whole organisms are used as plots, 
tt > variance of differences between estimated treatment effects will be 2vM/(nbk); whereas 
it they are used as blocks the variance will be 2v(v— 1) M/{nb(v — k)}, so the additional labour 
of using small plots may actually have led to a substantial loss of accuracy. 

If, on the other hand, the experimenter is concerned to investigate the mechanism of his 
treatment effects, the proposed method is of especial value. Thus, in a recent trial the remote 
effects proved to be nearly equal to the local ones, and this was of some interest, indicating 
as it did that the local application of a treatment affected all parts of the organism to about 
the same degree. In other circumstances, it may appear that treatments are strictly local 
in their effects, or that remote effects exist, and are or are not proportionate to local ones, 
and if proportionate are of the same or opposite sign. All these phenomena may be suggestive 
of biological modes of action. 

The chief use of the method, however, comes in exploring the desirability of using parts 
of an organism as plots. If it can be shown that treatments of a certain kind have only 
local effects, or have remote effects that are proportionate to the local ones, so that they do 
not invalidate a test, then in future study of such treatments it will be possible to use parts 
of the same animal or plant as plots without regard to remote effects, and to use any con- 
venient block design. Conclusions will then be based on M’ rather than on M with a gain in 
precision that may be considerable, and less material will be needed. Further, in investi- 
gating the possibility of doing this, some information will have been gained as to the effects 
of the treatments, much of it at a useful level of accuracy. 

In using the method described it is clear that the standard error between blocks is as 
important as that within, and it must, therefore, be based on a sufficient number of degrees 
of freedom. This was a fault of the trial from which the example was taken, though here it 
was the original intention to use twelve trees rather than eight. The need to have a reason- 
able number of degrees of freedom for M means that the method does not lead to a large 
economy of material as long as local and remote effects are both studied. Of course, if at 
a later stage it is decided that remote effects can be ignored, and accordingly whole plants 
or animals can be used as blocks without any of the complications discussed in this paper, 
the economies may be very great. 


SUMMARY 


The statistical implications of using organisms as blocks are considered, special attention 
being given to the case of a treatment applied to one plot having a local effect on the plot 
to which it is applied and remote effects elsewhere in the block, and also to the case of 
remote and local effects interacting. A worked example is given, and the usefulness of the 
method is discussed. 


I have been much helped by Mr H. M. Tydeman, who provided the data of the numerical 
example, and Mr G. H. Freeman, who checked the algebra. 


REFERENCE 


Paterson, A. B. & Leecu, F. B. (1954). Factors affecting the intradermal tuberculin reaction on the 
guinea pig. Amer. Rev. Tuberc. 69, 806-17. 





[ 150 ] 


THE USE OF A CONCOMITANT VARIABLE IN SELECTING 
AN EXPERIMENTAL DESIGN* 


By D. R. COX 


Department of Biostatistics, School of Public Health, University of North Carolina 
and Department of Statistics, University of Californiat 


1, INTRODUCTION 

Suppose that we have ¢ treatments for comparison using N experimental units, for example, 
animals, and that on each unit we have a preliminary observation, for example, the body 
weight, available before the treatments are allotted to the units. Suppose that it is expected 
that the final observation of interest, y, will, in the absence of treatment effects, be well 
correlated with the preliminary observation, x. Then a number of standard methods are 
available for exploiting this correlation in order to increase the precision of the estimated 
treatment effects. 

The object of this paper is to compare the methods in some simple situations with a small 
number of experimental units. Attention is restricted to experiments in which each of the 
alternative treatments appears the same number, k, of items, so that N = tk. 


2. SOME STANDARD METHODS 
We list first the following methods for increasing the precision of the treatment comparisons. 

Method I. An index of response is used, for example, y/x, y/x*, y—x, etc. The index may 
be suggested by analysis of previous data or by general consideration of a plausible form 
for the regression relation between y and x. 

Method II. The treatments are completely randomized over the experimental units and 
an adjustment made for regression on x by analysis of covariance. For simplicity only 
linear regression will be considered. 

Method III. The experimental units are ranked in order of increasing x and then grouped 
into k blocks of ¢ units each, the first block, for example, consisting of the ¢ units with lowest 
values of x. A randomized block arrangement is then constructed, based on this grouping 
of the units. 

Method IV. This is applicable when k is equal to, or is a multiple of, t. The units are again 
ranked in order of increasing « and a Latin square arrangement used to control not only 
differences between blocks but also variation associated with order within a block. Thus 
with t = k = 4, N = 16, the units are numbered from 1, ..., 16, in order of increasing 2 and 
a Latin square set out, as in the following example: 


Order within block 





Block no. 1 2 3 4 
1 1:7, 2:7, 3:7, 4:7, 
2 5:7, 6:7, 7:7, 8:7, 
3 9:7, 10:7, 11:7, 12:7, 
4 13:7, 14:7, 15:7, 16:T, 


* This paper was prepared with the partial support of the Air Research and Development Command, 
under contract with the U.S.A.F. School of Aviation Medicine. 
T Present address, Birkbeck College, University of London. 








Asim 
but w 

Me 
plus a 

Me 
associ 
bilitie 
criter 

Me 
to giv 
is ap 
atten 

An 
in lar 
exper 








D. R. Cox 151 


A similar method, employing Youden squares, could be employed if & is not a multiple of ¢, 
but would require a lengthier analysis. 

Method V. Methods II and IIT may be combined by using the randomized block design 
plus a covariance correction. 

Method VI. It often happens that the preliminary observation x is not the only thing 
associated with the units that can be used to increase precision. One of a number of possi- 
bilities is that a grouping into randomized blocks should be made on the basis of the other 
criteria and then a covariance adjustment made for 2. 

Method VII. A systematic, or non-randomized, design may be used, chosen, for example, 
to give maximum precision under the hypothesis that the regression equation of y on x 
is a polynomial of low degree (Cox, 1951). The use of such a design has the disadvantages 
attendant on the lack of full randomization. 

An eighth method is that of Papadakis (see, for example, Bartlett (1938)). This is useful 
in large experiments in which there are expected to be trends or serial correlation between 
experimental units adjacent in space or time, but will not be considered here. 


3. BASIS FOR COMPARISON 


Denote the ¢ treatments by 7},...,7',and the corresponding estimated treatment means, 
after adjustment by covariance where appropriate, by #,, ..., 9. We measure the random 
errors associated with the design by the variance of the estimated difference between a pair 
of treatments, averaged over all pairs of treatments, i.e. by 


Vi= Ave V(9;—9;)- (1) 
U,J,0#) 
This is a certain multiple of the population residual variance of and is not affected by errors 
in estimating o§. Call (1) the true (average) imprecision of the design; it is slightly different 
from the quantity suggested by Lucas and used by Greenberg (1953). 

Often, however, we are interested in the apparent imprecision, V,, making due allowance 
for the effective loss of information that arises in estimating the residual variance. To obtain 
V, we use Fisher’s factor (f+ 3)/(f+ 1), where fis the number of degrees of freedom available 
to estimate the residual variance (Cochran & Cox, 1950, p. 26); that is, we put 


,_yp(f+3 
Va = v(545). (2) 
This will be supplemented by the rather arbitrary rule that if f <5 no effective estimate of 
the residual variance is considered possible from the data alone. 

From the point of view of the length of confidence intervals and the sensitivity of signi- 
ficance tests, a situation with V, = 1 and with the variance estimated from the data is nearly 
equivalent to a situation in which the residual variance is known and V, = 1. 

The use of the average variance in (1) is a natural step but should be viewed critically, 
particularly in small experiments where there may be substantial variation in precision 
between different randomization patterns and between different comparis »ns within one 
randomization pattern. This is particularly so with Method II. 





152 Use of a concomitant variable in experimental design 


4. CALCULATION OF IMPRECISION 


To compare the different methods it will be assumed that, in the absence of treatment 
effects, y and x have a bivariate normal frequency distribution with correlation coefficient, 
p, and with the variance of y for fixed x denoted by o3. If the effect of variation accounted 
for by x were completely removed, we should have V, = 203?/k. Therefore we write 


202 

v= “284, (3) 
2 

v= 2H, (4) 


and use J, and J, as indices of true and apparent imprecision. Clearly I, > J, > 1. 
We now compute J, and J, for the various procedures set out in § 2. 


Table 1. Loss of precision from using wrong index of response 














] 
| Range of £,/f within which I,< | 
Lwtez 
¢ if 
| p | | ignored 
| 1-1 | 1-2 15 | 
= men ieah ET AAS! y pi 
‘ 
| 2 (—0-55, 2°55) (—1-19, 3-19) (—2-46, 4-46) | 1-04 
0:4 (0-28, 1-72) (— 0-02, 2-02) (— 0-62, 2-62) 1-19 
0-6 (0-68, 1-32) (0-40, 1-60) (0-06, 1-94) 1-56 
| 0-8 (0-76, 1-24) (0-67, 1-33) (0-47, 1-53) | 2-78 
0-9 (0-82, 1-18) (0-74, 1-26) (0-59, 1-41) | 5-26 
| 0-95 (0-90, 1-10) (0-85, 1-15) | (0-77, 1-23) 10-26 











Method I. Suppose that the true regression coefficient of y on x is # and that the index 
of response is y— x. This situation has been considered by Gourlay (1953). It is easily 
shown that the index of true imprecision is 


Be 
IP = (: — 2ptle+ pf) (1—p?)"" (p+0), 


1+fpoz/o, (p= 9). 


(5) 


If no attempt is made to use 2, i.e. if 2, = 0, J, = (1—p?)-. Thus the attempted correction 
is an advantage whenever 0 < {,/£ < 2. Table 1 shows the ranges of values of £,/f for which 
I,< 1-1, 1-2, 1-5 and so tells us how near f and /, need to be to avoid losing a specified amount 
of information. 

Fisher (1935, p. 163) criticized the use of indices of response based on inadequate assess- 
ments of the relation between y and x. It is no contradiction of these criticisms to conclude 
from Table 1 that, particularly if p is not near unity, £, does not need to be very near / to 
give a worth-while increase in precision. The use of non-linear indices such as y/x* allows 
curvilinear regression to be accounted for. Note, however, that if the treatment effects are 
constant, independently of x, on the y scale, they will not be constant when the index is 
used; if the index and the original observation have equal physical significance there may 





— a 








well | 
meas 
long- 
conn 

M 


wher 
x for 


This 


popt 
if we 
imp! 


Her 


whe 





al 








os. 





D. R. Cox 153 


well be no reason for expecting one rather than the other to be the appropriate scale for 
measuring treatment effects. However, if the object is the estimation of the effect on the 
long-run average of y of a change in treatments, there may occasionally be difficulties 
connected with a naive use of average of y/x*. 

Method II. Denote the terms of the analysis of covariance as follows: 


Treatments 7,, T,, Tyy 
Residual Rus Bua “es 
Total Bg, Bi Me 


where, for example, R,,,, denotes the residual sum of products of x and y. If Z; is the mean of 
x for the ith treatment and 9; the mean of y adjusted for regression on 2, 


_ py — 298[, , —%)") 
V9; 9) = Et oR e 
202 yi | 
Ave V(9;—9;) = —*}1+ a 
“ @ 93) k (t—1) R,,} } 


This is conditional on fixed 2’s; but the 2’s are in fact a random sample from a normal 
population and the second term in brackets is therefore proportional to an F variate. Hence 


if we take expectations and remove the factors 202/k, we obtain for the index of true 
imprecision 


]® = ee mee! 
tae 
N-t-1 
"aoe (7) 
Also I® = I? vo , (8) 


since the residual degrees of freedom, after adjusting for regression, are N —t—1. 

Table 2 gives the values of (7) and (8) for various values of t, k with N < 20. 

The increase in variance above that to be expected when the effect of x is completely 
eliminated can be described as due to errors in estimating the residual regression coefficient, 
or as arising from the non-orthogonality present in the linear set-up chosen to represent 
the populations sampled. 

Method III. Consider first one block and a fixed set of x values x,, ...,2,. In the absence 
of treatment effects, the corresponding observations on y are /x,+€,...,/%,+6, where 
fis the regression coefficient of y on x and ¢,, ...,€,are independent with constant mean and 
with variance o2, and determine the dispersion of y about its regression line on x. Hence 
if two positions are selected randomly to carry treatments 7; and 7;, the expected mean- 
square difference between the resulting observations is 


i ft 1 S(a,-3) 
sg. sV> — 2) = o2 i 
oot gy mY = 701+ To G-1) of 


Hence, averaging over the k blocks, and over the distributior. of the x’s, we have that 





2 
I? = 1+ Ree W, (9) 
P 


where W is the expected mean square of x within blocks, divided by the variance of x. 





= SPE SBSeoea ae KM SCI KPBS RFS SERrF aA Se PP a oOo + To Oe & a = & oS 
a See Se SBS ee Ce Cet eee sas as - saa gags So 
se 5 aS SS maeBSEH EE oS Om FE > Srsétss at + ~ mF 3S 





“S}TUNT] JOMOT OLB A POU OZ 
UAIS SON[BA CYT, “Sl10}se UB Aq poyxIBUL oI WOPEeJJ JO SooAZop [ENpIsel G UBY} JOMOJ O41 O.1OYY OIOYA SOSBD “2 YIM PoPeTOOSSe UOTZELIBA BY} JO UOT 
~wuluutTe oj0[du100 syueseider | = J “WIOpeody JO Soaidop [eNpIsor OY} JO JUNODOR Zuryey oouvrea queredde Jo oinsvoul B se}OUOp "7 feoURLIBA [BNpIseEr 
ey} SuyeuIyse UI Ss10110 JO JUNODD” OU ZuUTYBy ‘eouBIIBA ONI} JO OANSBOUT B SoJOUEP *7 “UBISEp Jo poyyour oY} 07 soyou *** *(g) “(Z) soxysodns oY, 















































| | | | | | | 

ort | set} ser opt, ser!) set! o¢t| opt! omt|) opt! set] sot) 20-4 * | wl 
2 oT | OL) Ord LUT | 80-T | 3t-t | O8T | FET] LTT] Ltt | sen] get] est | oot oI 
> 6-1 | 1) eT) Se) SBI) OFT | SET] set] eT} set] + 8-1 | 09-1 * “gl 
ws 90-1) 90-T| LOT] 901 | LOT | 80T| 201} 80-1 | 80-1 | OUT | OL) IT} StL} Ltt] “gi 
ms 9€-1 | 93-1 | = 96-1 | TE | BGT | BE LOT | = =698-T IFT TFT * 99-1 | 99-1 2 @l 
3 TET | SOT | BOT) BET) OFT | ONT) FET) TET] BET) BET] O8t] Ltt] Ltt] get | ol 
8 — wae |) ee et}; — | « | ost] — — - — i * | 1/1 
‘S = — | — oot | — ovr}; sot}; — | — — —|- 61 | 96-0 
= = _ 1 sr) — | 2) -—' | = —_ — _ seI | 6-0 
= = sl = = “ti. — Mii ~~ | = — 3 ot SIT | 80 
po = — _ oa cor | — Ol) wT} — — —_ |— ~~ 90T | 9-0 
‘Ss a = 5. = an —t. = Ot | wt] — aie Eettecs a ZO-1 | +0 
2 = - al _ or; — | cor; or; — ~ — =< 2 0=4 igh 
3 — -- — | — |org900] — | oe%00!] 91000) — | — | — -- — | et01-0 A 
3 0-1 cI CUT | 3a SIT) SET oB-1 0@-T ZE-1 GS-1 eE-1 6-1 63-1 * l/l pl 
he eo | S| 263] I9T | 861} I) LT iz] stz| sre, eos] 6¢2| wel] ree! 96-0 
§ PET | ST | BOT | Sot} Sh) L461) S80) OLT] FT| 661) Let] eet] 0s] L108] 60 
BS OUT | 331 | 8eT str | Stl! «Tt ptt |) 6ost | set] 61 | «(ost | «(ost | at] 6T! 680 
5S £01 | LOT} GOT] OT) 901 8Et | OT) GOT} Lot; ett!) 90T| Ott] ett} tt! 90 
Ss 10-T | ZO-1T | 0-1 10-T | ZOE | HOT 101 | OT | ZI) OT) BOT! OT!) GOr| Ot! 0 
S 00-1 | OOF) OOT |) 00OT | O01) 0OT 00T 0OT | OOF 00T 00 | 00T | OOF! 00-1 0=F 2g 
8 | | 
= | 0290-0 | FIZT-O | FRGT-0 | $290-0 | SOOT-0 | Z8ZZ-0 9L0-0  EE9T-0 | BLZT-O | EZEZ-O  ZIIT-O | STLT-O | 068-0 | 86Fz-0 M 
> ) ee 2 St} Bet} Sst | FT| Set} sor; ser: get] ert] FT L9-1 wl 
2 90-1 | LOT | g0-1 Lol} 80-0) OL = 80T Ot} OFT) att) att | PIT | LUT | ost | el 
b SRE Se | ee ee Oe See SERS SA = Sncate 

| | | | 

ol ¢ ' ,i@ iis 8 | F cl ers | r | 8 s | ¥ 

.=|.? ¢ : +2 9 a a ee ee | r g ? 

oe | 0 0z 81 81 81 or | ot | ct | gt rn rd ar | C6 N 

t 








154 
| 


subisap snorsna sof uorswasdunr fo aansvayy *% 8q°R I, 






SPAY USANA FO HEN AU TSR SEER 


aus 





———EE 


D. R. Cox 155 


We need, in order to calculate W, to consider the following. Take an ordered sample 
2%), «+» %) from a unit normal population. Divide this into blocks as described above and 
find the mean square within blocks. Then W is the expected value of this mean square and 
can be calculated for NV < 20 from recently published tables of the second moments of order 
statistics in a normal sample (Teichroew, 1956); Table 2 gives some numerical values derived 
in this way. 

The general conclusion from the values of J? and J is that Method III is somewhat 
better than Method II if p<0-6 and that Method II becomes appreciably better than 
Method ITI only when p is as large as 0-8 or more. It makes little difference if the comparison 
is based on I, instead of on J). In larger experiments with moderate ¢ and large k, both methods 
will be effective in reducing the value of J, to near unity, except when p is very near unity, 
when the use of Method ITI will be inadvisable. However, Method ITI will remain reasonably 
efficient for any form of smooth regression between y and 2, not just for linear regression. 
If the regression is linear, but the distzibution of x is leptokurtic, the randomized block 
method is likely to be relatively less effective due to the end-blocks having units with 
widely discrepant values of x. 

Method IV. The argument is similar when a Latin square is used, except that W’, equal 
to the expected residual mean square in the two-way array of x’s formed from the rows and 
columns of the square, replaces W. Table 2 gives the value of W’, and of J‘ in certain cases. 
If we have r squares, each of size ¢ x t, the residual degrees of freedom are (¢— 1) (rt—r—1), 
when the residual within squares and the treatment x squares terms are combined. This 
number is small in the cases examined. 

The additional precision gained by eliminating ‘order within blocks’ makes the critical 
value of p at which Methods IT and IV are approximately equivalent equal to about 0-8. 

Method V. This is the use of x simultaneously for blocking and for covariance correction. 
There are two possibilities. We may analyse the design as a randomized block with co- 
variance, estimating the regression coefficient from the residual line of the analysis of 
covariance. Equation (6) applies with R,,, still defined as the residual sum of squares, this 
time in the randomized block analysis. We again require the expectation of 7T',,./R,.,.; over 
the randomization with the 2’s fixed 


E(T,2/ Ry) 2 E(T,,)/E( Rx) = 





a, (10) 


with near equality in a large design. Hence 

1 
(5) > ee es 

IP 21+ Ey eL Ty’ (11) 

and the right-hand side of (11) will be used as an approximation to J{; the error will be of 

the order of [1/(t— 1) (k—1)]*, as can be shown by the expansion methods of large-sample 

theory. 

The second possibility is to analyse the design as if it were completely randomized. This 
would be in order if the arrangement into blocks has no effect on the y-values other than 
that due to correlation with the x’s, and if the assumptions of the least squares model can 
be postulated, i.e. if we use more than pure randomization theory. In this case the argument 


parallel to (10) leads to 1 


(5)/ es seer 9 
Is 21+cEray’ (12) 


again with near-equality in a large experiment. 








156 Use of a concomitant variable in experimental design 


The numerical values in Table 2 and a direct comparison of the formulae show that if 
k>3 the lower limit for J is greater than J. Hence, under the conditions postulated, 
Method V is inferior to Method II. Even with the second method of analysis, i.e. with (12), 
it seems unlikely that there is appreciable gain in average precision over Method II, under 
the conditions assumed. 

Similar conclusions are reached from J® and [®”. 

Method VI. In this we consider a randomized block design with grouping based on a 
criterion separate from x. Quantitative investigation of this, based, for example, on the 
assumption that x, y and the property determining the grouping have a trivariate normal 
distribution, has not been attempted. We can, however, deal with two limiting cases. The 
system of blocking may be identical to that based on x. Equation (11) is then applicable. 
Or the criterion for grouping may be independent of x, in which case 


t—1)(k-1)-1 


«@ — ( o 
= (t—1)(k-—1)-—2° 


(13) 

For the smaller values of ((—1)(k—1) among the designs investigated, this is about 
1-15 J, showing that in these cases the additional system of blocking should be included 
only if there is a reasonable prospect of a reduction of 20 % or more in residual variance. 
For the larger values of (¢— 1) (k—1), J{ and J? are very nearly equal. 

Method VII. This method is theoretically the most efficient one when the observations 
on y are built up of a polynomial trend on x plus treatment effect plus random error of 
constant mean and dispersion. The method is best illustrated by an example: suppose that 
N=9, t=k=3, that a second degree curve is considered adequate to represent the 
regression of y on x, and that the values of x are in order 


—1:3, —0-9, —0-8, -—0-5, —0-2, 0-0, 0-4, 07, 1-1. 
The most systematic procedure is to start by forming first and second degree orthogonal 
polynomials, £}, £3, from these observations. If m, = Xx, these are 

£, = x%,—m,/m, 


in. 2 
a My My— MM), (My My—Ms (14) 
’ mym,—m> }°*  \mym,—m? 


~te 


The numerical values are 

fi: —113, -—0-73, —0-63, —0-33, —0-03, 0-17, 0-57, 0-87, 1-27, 

g: 0-89, 0-08, —0-07, —0-40, -—0-55, -—0-56, -—0-32, 0-07, 0-86. 
These are then normalized by dividing by ,/(2&;?) to give , and &,, namely, 

E,: —0-50, -—0-33, -—0-28, -—0-15, —0-01, 0:08, 0-25, 0-39, 0-57, 

g.: 0-57, 0-05, —0-04, —0-26, —0-35, -—0-36, -—0-21, 0-04, 0-55. 
We have to select from the nine units, three to receive 7,, etc. Let £,£,, =, &5, etc., denote 
the sum over those units receiving 7’, of £,, &. This gives us six numbers. The treatment 


arrangement that minimizes the sum of squares of these six numbers is very nearly, or 
exactly, the most precise arrangement. Trial and error shows this arrangement to be 


zx —-13 -09 -08 -05 -02 00 04 07 11 
T, T; T, T, T; . T T 21, 

















The 
app 


In 


int 
ri 
in 
an 


If 








—— = 


D. R. Cox 157 


(a non-randomized block design) with 
T, T, T; 
xg, —0-17 0-14 0-05 
Le. 0:00 0:25 —0-26 
The sum of squares of these numbers is S,,;,, = 0:1811, and the value of J, is, in general, 
approximately (Cox, 1951) 





19 ~ (1-4?) g Byun (15) 

= \ k(t —1) wes min. f  > i 
which in this case is 1-03. (The residual degrees of freedom would be 4 if both linear and 
quadratic trend were removed and 5 if only linear trend is removed.) 

In general the systematic search for the optimum arrangement is tedious although it is 
usually possible to find quickly an arrangement that is nearly the best. If the degree of 
the polynomial is small compared with k, it will usually be possible to find an arrangement 
with J‘ negligibly greater than 1. The disadvantage of this method is the lack of randomiza- 
tion; the method is of most value in single small experiments. 


5. Discussion 


In deciding what design to use in a particular case, we should consider 
(i) the values for imprecision given above; 

(ii) the extent to which departure from assumed conditions is likely to affect (i); 

(iii) the importance to be attached to simplicity of design, and analysis; 

(iv) the extent to which considerations other than precision are relevant. 

The general conclusion from the calculations in §4 is that the methods based on co- 
variance are preferable to the simpler methods based on blocking only if the correlation 
coefficient between y and z is at least 0-6, and that under the conditions postulated the 
systematic design, Method VII, is the most precise. For larger experiments all methods 
except the first are likely to have J near unity. 

The main assumptions in the calculations are the linearity of the regression and the 
normality of the distribution of x. Non-normality of x should have little effect on the 
efficiency of covariance analysis, while J for the blocking methods will usually be an in- 
creasing function of the kurtosis of the distribution of x. If the regression is non-linear but 
smooth, blocking methods will remain effective, while covariance methods will not, unless 
the linear component accounts for most of the regression, or multiple covariance used. 

The methods of design are all simple except Method VII. Details of analysis are, of 
course, simpler for methods not involving covariance. 

There are two further considerations. The form of the relation between y and x may be 
of intrinsic interest, either in helping to understand the experimental material or in giving 
information useful in the design of further experiments. Also we may suspect that the treat- 
ment effects are not independent of z, i.e. that there is a treatment x x interaction. Such an 
interaction may give useful insight into the mechanism underlying the treatment effects 
and may also change any practical recommendations to be made from the experiment. 
If these considerations are relevant we shall normally prefer to use x quantitatively. 


I am grateful to Dr B. G. Greenberg for very helpful discussion. 








158 Use of a concomitant variable in experimental design 


REFERENCES 


Bart ett, M. S. (1938). The approximate recovery of information from replicated experiments with 
large blocks. J. Agric. Sci. 28, 418-20. 

Cocuran, W. G. & Cox, G. M. (1950). Experimental Designs. New York: John Wiley and Son. 

Cox, D. R. (1951). Some systematic experimental designs. Biometrika, 38, 312-23. 

Fisner, R. A. (1935). Design of Experiments. Edinburgh: Oliver and Boyd. 

Gozrtay, N. (1953). Covariance analysis and its applications in psychological research. Brit. J. Statist. 
Psychol. 6, 25-34. 

GREENBERG, B. G. (1953). Use of covariance and balancing in analytical surveys. Amer. J. Publ. 
Hlth, 43, 692-9. 

TEIcHROEW, D. (1956). Tables of expecte» values of order statistics and products of order statistics 
for samples of size twenty and less from the normal distribution. Ann. Math. Statist. 27, 410-26. 














EE 


[ 159 ] 


APPROXIMATE CONFIDENCE LIMITS FOR COMPONENTS 
OF VARIANCE 


By M. G. BULMER 
Unit of Biometry, 6 Keble Road, Oaford 


1. STATEMENT OF PROBLEM 


Suppose that , and UM, are independent mean-square variates with f, and f, degrees of 
freedom and unknown expected values (9+07) and o? respectively, where 6 and o? are 
non-negative. In other words, f, M,/(@ + 0”) and f, M,/o? are independent chi-square variates 
with /, and f, degrees of freedom. The problem considered in this paper is to find confidence 
limits for 6, the difference between the expected values of the two mean squares; that is to 
say, we want to find a function of M, and M,, f(1M/,, M,), such that, at least approximately, 


Pr[f(h, Mz) <6] = @, (1) 


whatever # and o? are. The function, f, will of course also depend on the value of «, and we 
shall be interested in the cases when @ is either small or large. If « = 0-05, for example, 
the assertion ‘0 <f(M/,, M,)’ will be correct 95% of the time and f will be called the upper 
95% confidence limit for 0; if, on the other hand, « = 0-95, the assertion ‘0 >f(M,, M,)’ 
will be correct 95 % of the time and f will be called the lower 95% confidence limit for 0. 
The interval between the lower and upper 974 % limits is a 95°%, confidence interval for 0. 

Such confidence limits will often be useful for estimating the components of variance 
which arise in the infinite model of the analysis of variance. Approximate solutions of this 
problem have been given in four recent papers (Bartlett, 1953; Green, 1954; Huitson, 1955; 
Welch, 1956). A different approach is adopted in this paper which leads to a solution, given 
in equation (6), at once simple and reasonably accurate. In the second half of the paper, the 
accuracy of this solution is compared with that of the other approximations which have been 
suggested. 


2. AN APPROXIMATE SOLUTION OF THE PROBLEM 


It is reasonable to suppose that the confidence limit, f(,, M,), can be written in the form 
M,9(F), where F = M,/M,; for if all the observations underlying the analysis are multiplied 
by a constant, c, then M,, M, and @ are each multiplied by c? and the confidence limit should 
be multiplied by c? too. Therefore f(c?M,,c?.M,) = c?f(M,, M,) for any c; the result follows 
if we put c? = 1/M,. We must therefore find a fun >tion, g, such that 

Pr [M,9(F) <4] = a, (2) 
whatever 9 and o? are. But 

Pr [M,g(F) < 6] = Pr LF <g7(0/M,)] = Pr[, < Mg (4/)], 

where g~ is the inverse function of g; it is assumed that g(/) is a continuous and increasing 
function of F and that g-! therefore exists. Now write m, = M,/(0+ 07), m, = M,/o* and 
p = 0/07, so that m, and m, are standardized mean-square variates with f, and f, degrees 
of freedom. Then 


Mg !( p/m) 
Pr (LV, < M,g71(0/M,)} = Pr [m, < ca ; 





160 Approximate confidence limits for components of variance 


and we want to find a function g which satisfies the equation 


—1 oe) —1 
mle ef AE matmen 


for any p, where P, is the cumulative distribution function of a standardized mean-square 
variate with f, degrees of freedom, and 7, is the frequency function of a standardized mean- 
square variate with f, degrees of freedom. If this can be done, then M/,g(F) will be a con- 
fidence limi. for 0. 

Two limiting conditions can be placed on g(F’). Let us write L, for the lower 100a % 
point of the F-distribution with f, and f, degrees of freedom and L, for the corresponding 
lower 100a°% point with f, and oo degrees of freedom.* Then it is obvious that the limit 
M,9(F) should be zero when F = L,. Also g(F) ~ F/L, as F +00, since 


Pr[(M,F/L,<6)=Pr(I/0<L,]>« as po 


and since F' will almost certainly not be very large unless p is also very large. Thus we 
want to find a function, g(F'), which satisfies the conditions 


g(L,) = 0, 


g(F)~ F/L, as ed (4) 


Any function satisfying these conditions will give an exact confidence limit when p = 0 
and when p> oo. 


The simplest function satisfying these conditions is the linear approximation, g,(/): 
gi(F) = (F-L))/L,. (5) 


This is also an exact solution of the problem for all p when f, is infinite (f, remaining finite); 
for values of p between 0 and infinity it gets worse as an approximation the smaller /, is. 
The adequacy of this approximation was tested by direct numerical integration of (3), 
with gy; substituted for g-!, and both the lower and the upper limits were found to be too 
wide. I therefore introduced a term in F— and tried as a second approximation 


a(F) = 7-14 B(1- 7). (6) 
This approximation satisfies the conditions of (4); the constant — 1 was chosen to ensure 
that it reduces to the linear approximation when f, is infinite and the linear approximation 
isexact. It is shown in the next section that the error (P —«), where P = Pr[M,9,(F) < 4], 
is of order (/,/f.)*. The approximation M,g,(F) is the one considered in the rest of this paper. 

When F is small negative limits will sometimes be obtained for 0. It is suggested that in 
this case the limit should be taken as zero. If this is done the upper limit can never be less 
than @ when @ = 0 and the information that @ is non-negative has been wasted. This dis- 
advantage is common to all methods which find a confidence interval for 0 by means of 
upper and lower limits with separately fixed significance levels. 


* It is important to notice that when we are considering, for example, the lower 95 % limit for 0, 
then « = 0-95 and L, is the upper 5 % point of F with f, and f, degrees of freedom; but that when we 
are considering the upper 95 % limit for 0, then « = 0-05 and L, is the lower 5 % point of F with f, and f, 
degrees of freedom which is the same as the reciprocal of the upper 5 % point of F with f, and /, degrees 
of freedom. Similar remarks apply to L, with oo substituted for f,. 





a I OCIS 








As 
samp 
and t 
metre 
freed 
the d 


The iT 
respe 


and 1 





ve 


4) 





———— — 








M. G. BuLMER 161 


As an example of this method, consider an experiment quoted in Davies (1947). Six 
samples of penicillin were each grown in turn on the same set of twenty-four Petri dishes, 
and the diameters of the zones of inhibition of bacterial growth were measured in milli- 
metres. The mean square for differences between dishes was 4-61 and had 23 degrees of 
freedom and the error mean square was 0-304 with 115 degrees of freedom. If we can regard 
the dishes as a random sample from a normal universe of dishes with variance o%,, then 


0 = E(M,)—E(M,) = 60%,. 


The lower and upper 97-5 °% confidence limits for 0 given by M,g,(F) are 2-48 and 8-80 mm.? 
respectively; thus the 95° confidence interval for o%, is 0-413 to 0-918 mm.?. 


3. THE ACCURACY OF THE APPROXIMATION 


P = Pr[{M,g,(F) <6] is exactly equal to « when p is zero or infinite or when f, is infinite 
(f, being finite). The purpose of this section is to study the behaviour of P for intermediate 
values of p. If we write r = f,/f,, then we can show that, for large f, and f,, P depends only 
on p and r and not on the actual values of f, and f,. In fact, if we divide M,g,(F')—@ by o? 
we get P = Pr[z< 0], where 


z= 


et). mal, (y_ z)- - 
5 ae i 
If we let f, and f, tend to infinity while r = f,/f, remains constant, then we can write 
Ly = 1-£,(2/f, + 2/f2)t and L, = 1—£,(2/f,)4, 


where €, is the upper 100« % point of the normal distribution; when « = 0-05, for example, 
£, = + 1-6449, and when a = 0-95, £, = — 1-6449. We can therefore see that z becomes 
approximately normally distributed with mean 


efio+ (2) + Sata -CH), 


i (8) 


and variance (p + 1)? (2/f,) + (2/f,). Thus 


mee [+ ee lb oe . 





For fixed 7, this quantity is at its maximum when (p+ 1)? = (1+ r)t+1. Thus the extreme 
value of P is about a1 +r)t—(1 +e) 
r > 





a+ £,f'6.)|1- (10) 
where f(£,) is the ordinate of the normal curve at ,. Thus we can calculate that when /; 
and f, are large and « = 0-05, P lies between 0-05 and 0-05252 when r = 1; the corresponding 
maxima for r = } and } are 0-05087 and 0-05026. It will be noted that (P —«) is of order r*. 

Equation (10) gives us the extreme values of P when /f;, is large. In order to investigate 
what happens when /;, is smail I have expanded P in the form a+ ,/f,+a9/f$+.... It is 
shown in the Appendix that «, = 0 for all p. It is also shown how a, can be calculated. The 
maximum and minimum values of ¢ = ,/f? (multiplied by 100), taken over all possible 
values of p, are given in Table 1, for several values of f, and «. This table gives a good idea 
of the behaviour of the extreme values of P for all values of f,. When /, is large, formula (10) 


Ir Biom. 44 





162 Approximate confidence limits for components of variance 


can be expanded to give maxc (for «<4) or mine (for «>4); in either case the other 
extremum is zero. In the case « <4, minc is zero for all f, and maxc increases steadily to 
its limiting value as f, increases. In the case a > 4, the position when f, is small is the reverse 
of that when f, is large; minc is zero and maxc is positive (and is larger the smaller /;, is). 
For moderate values of /,; both the maximum and minimum values of c are non-zero. 


Table 1. The minimum and maximum values 100c over all values of p. 
Min 100c for a < } is always zero 























ti a Max 100c a Min 100c Max 100c 
| 
| 2 0-01 0-00 0-99 0 4:38 
4 01 -01 99 0 1-77 
| s 01 06 99 0 0-70 
| 12 01 -09 -99 0 0-40 
24 01 14 -99 — 0-05 0-13 
60 01 17 -99 | —0-12 0-02 
120 01 -19 99 —0-15 0 
ee) 01 -19 99 | —0-19 0 
| - 
2 0-025 0-00 0-975 0 5-84 
4 025 -02 975 | 0 2-38 
8 025 ‘ 975 | 0 0-93 
12 025 17 975 — 0-02 0-51 
24 025 +25 975 —0-13 0-15 
60 025 31 975 — 0-26 0-01 
120 025 +34 975 — 0-30 0 
00 025 36 975 — 0°36 0 
2 0-05 0-00 0:95 0 6-50 
4 05 -02 “95 | 0 2-60 
s 05 -16 95 | 0 1-01 
12 05 +25 “95 — 0-08 0-53 
24 05 +37 -95 — 0-27 0-14 
60 05 46 “95 — 0°42 0-01 
120 05 -49 “95 — 0°47 0 
re) 05 53 “95 — 0-53 0 


























If we write cy for one of the extreme values of c in Table 1, then the Taylor expansion for 
the extreme value of P would be «+c r*+.... However, as f,->00, cor? >&,f(E,) 17/32 (see 
equation (A 10) of the Appendix), whereas the extreme value of P tends to (10). We might 
therefore hope to get a better approximation for the extreme value of P from the equation 


(1+r)t-(1 +h) 


r 





a+32eJ 1-2! (11) 


which reduces to (10) when /, and f, tend to infinity together. The extreme values of P have 
been obtained by direct numerical integration of (3) in some cases, and these exact values 
are compared with the approximations given by (11) in Table 2. The approximation seems 
to be fairly good. It can be concluded that P will not differ appreciably from a in any case 





EE 











——EE 


M. G. BULMER 163 


likely to be met with in practice, and that for any value of f, the accuracy becomes better 
rapidly as r becomes smaller (roughly in proportion to r?). 


Table 2. Comparison of the extreme values of P obtained by numerical 
integration with the approximate values given by equation (11) 


























os . a | Exact range Approximate range 
ies salah 

8 6 0-05 0-0500 —0-0514 0:0500 —0-0511 
8 12 05 -05000— -05048 -05000— -05041 
24 24 05 | 0500 — -0520 0500 — -0518 
24 48 05 -05000— -05064 -05000— -05061 
2 6 “95 -9500 — -9550 9500 — -9554 
2 12 *95 *9500 — -9515 9500 — -9515 
8 6 95 | 9500 — -9586 9500 — -9571 
8 12 “95 | ‘$500 — -9530 9500 — -9526 
24 24 95 | 9491 — -9511 *9487 — -9507 
24 48 95 | -94962— -95030 *94956— -95023 





4. COMPARISON WITH OTHER APPROXIMATIONS 


(M, —¥,) is an unbiased estimator of 6 with variance 2w,(0 + o?)* + 2w,o*, where w, = 1/f; 
and w, == 1/f,. If we estimate (0+ 07)? by M? and o* by M3 and regard (M, —M,) as an 
approximately normal variate, then an approximate confidence limit for 0 is 


M, — M+ &,(2w,M3 + 2w, M3)t = MF —1+£,(2w, F? + 2w,)4], (12) 


where &, is the upper 100% % point of the normal distribution. This approximation is well 
known and will be called the first normal approximation. Welch (1956) has developed a 
series solution, analogous to the Cornish-Fisher expansion, for the general problem of 
finding confidence limits for linear combinations of several variances. For the special 
case of the difference between two variances considered in this paper, his first approximation 
is the first normal approximation given by (12); his second approximation, which will be 
called the second normal approximation, has an error of order (w,, w.) and is 


- 2 +4 2(9¢2 a 
MF 1+ Gq( 20, F?-+ 2u0y)h + §(2E +1) Carp, 
and his third approximation, which will be called the third normal approximation, has an 
error of order (w!, w#) and is 

(wy F? — wi) 
(w, F? + we) 
(wij F4 + w3) 

(w, F? + wy) 





u, [F ~14£, (20, F? + 2u,)t + 2(282 + 1) + 2(2w, F2 + 2u,)-# 


x |- E(w? F? + w2) + (562 +96.) — (1683 + 23¢,) WEP — way } . (14) 


(w, F? + we)® 
The second normal approximation is equivalent to Bartlett’s (1953) approximation to the 
order of magnitude to which it is correct. 

Huitson (1955) has developed an alternative expansion for the general problem of finding 
confidence limits for linear combinations of variances. Welch (1956) has also derived this 
alternative expansion by a slightly different approach. Huitson was mainly concerned with 


1Ie2 





164 Approximate confidence limits for components of variance 


the problem of estimating the total variability (that is, the swm of two or more variances), 
and his method seems entirely satisfactory for this purposes. It is not satisfactory, however, 
in the case of the difference between two variances considered in this paper. Huitson’s 
expansion, when applied to this special case, becomes 


MF 1) 


ie 15 
1 —£,(2w, F? + 2w,)!(F—1)+... (15) 


It is easily seen that this is an unsuitable approximation near F = 1. For consider (15) 
taken as far as the large sample approximation (i.e. as far as it is actually written explicitly). 
The limit is zero only when F = 1, which contradicts the first condition of (4). Furthermore, 
the upper limit is infinite for some critical value of F, say F,,, greater than 1 for which the 
denominator becomes zero and the lower limit becomes infinite for another value of F, 
say F,, less than 1. When F, < F< F,,, the upper limit is negative and the lower limit is 
positive! It is true that (15) leads to Welch’s first expansion if the denominator (1 —h)“ 
is expanded in a formal Taylor series; but, when F is near 1, h becomes very large and the 
Taylor expansion diverges. 


Table 3. The limiting probabilities that 0 is greater than the three normal 
approximations as either f, or p approaches infinity 














First normal | Second normal Third normal 
approximation | approximation approximation 
fi } ee ee 
| | | | 
| a@=0-05 a=0-95 a=0-05 a= 0-95 a=005 | a=0-95 
| 
WE ui mie megs be ote, i aL z PAA Derr) Apes a, 3 ete 
| | 
30 =| 011392 | 0-99266 > 0-06303 0-92551 | 0-05306 | 0-95550 
60 09412 *98377 05688 93931 | 05120 | *95185 
120 | -08051 -97544 -05359 *94513 05045 | -95063 
240 ‘07121 -96870 05186 *94770 05017 — | -95022 
480 | 06480 96354 | 05095 ‘94889 -05006 -95007 
960 | 06035 *95973 05049 -94946 | -05002 -95003 
¥ ae Huts (a 4h) 4 | 








The accuracy of the three normal approximations (12), (13) and (14) depends on both 
f, and f, being large; the accuracy of my approximation depends mostly on the ratio /,/f, 
and does not get any better if they are both increased together. We can investigate the 
accuracy of the normal approximations as either f, or p approaches infinity as follows. When 
f,-> ©, the first normal approximation (12), for example, becomes M, — 0? + &, M,(2w,)? and 


P = Pr[M,+&,M,(2w,)! < 0+ 07] = Pr[m,< (1+&,(2w,)4)-], 


where m, is a standardized iaean-square variate with f, degrees of freedom. When /f, is 
finite but poo, terms in (12) not involving F can be ignored and 


P = Pr[M,+£, M,(2w,) < 6] = Pr[m, < (1+ &,(2w,)4)“), 


as before. Thus the limiting probability as either f, or p tends to infinity can be obtained 
from tables of the chi-square integral. The limiting values for the second and third normal 





———— 


ele 











al 





_— 


M. G. BULMER 165 


approximations corresponding to the reciprocal of 1+£,(2w,)! are easily found to be the 
reciprocals of 


1+£,(2w,) + $(265 +1), and 14 £,(2w,)# + $(2E2 + 1) wy +34(2e,)! (1383 + 17€,) 


respectively. The corresponding probabilities for « = 0-05 and 0-95 are given in Table 3. 
We shall certainly not underrate the accuracy of the approximations if we confine ourselves 
to these special cases. Thus the first normal approximation is rather bad even when f, is as 
large as 1000; the second normal approximation will only be better than mine under rather 
exceptional circumstances (the maximum deviations | P—a| for my approximation are, 
when /f, is large and a = 0-05 or 0-95, about 0-00252, 0-00087 or 0-00026 when f,/f, is 1, 
3 or }). The third normal approximation will often be better than mine, but it seems doubtful 
whether the gain in accuracy is worth the increased labour of evaluating it. 

The accuracy of Huitson’s approximation (15) for large p can be investigated in the same 
way, although the argument about the limiting probability as f,—0o no longer holds. It 
turns out that if we take Huitson’s approximation as far as the term of order (w?*, w}'), then 

Table 4. The limiting probabilities that 0 is greater than 
Huitson’s approximation as p approaches infinity 





Huitson’s first Huitson’s second Huitson’s third 
approximation approximation approximation 
fi 
a= 0-05 a= 0°95 a = 0:05 a=095 | a=0-05 a = 0:95 
30 0-03066 0-93826 0:04823 0-95107 0-05009 0-95004 
60 -03725 -94105 “04919 | -95058 -05002 -95001 | 
120 -04145 *94334 04962 | -95031 -05000 -95000 | 
240 | -04416 -94511 -04982 -95016 -05000 -95000 | 
480 | -04598 | -94645 “04991 -95008 -05000 -95000 | 
960 -04720 -94744 -04995 -95004 -05000 -95000 


as poo, P->Pr[m,<C| for all f,, where C is the Cornish—Fisher expansion for the per- 
centage point of m, taken as far as the corresponding term. These limiting probabilities are 
given in Table 4. It can be seen that for p->co Huitson’s approximation is considerably 
better than the normal approximation of the corresponding order. It might, therefore, be 
possible to obtain a much better approximation by using the normal approximation for 
small F and Huitson’s approximation for large F. 

Daniels (1939) has proposed a slight modification of the first normal approximation (12). 
For he points out that M? and M3 are not unbiased estimators of (7+ 07)? and o*, whereas 
f,M3(f, +2) and f, M3(f.+ 2) are; he suggests that the sampling variance of (J/, — M,) 
should be estimated by 2M?(f,+2)-!+2M3(f.+2)-! instead of by 2M?f>'+2M3fr'. 
The first normal approximation is, however, so bad that Daniels’s modification makes little 
difference. 

Green (1954) has developed approximate confidence limits by a method more like the one 
adopted in this paper. His solution, however, is not explicit and is incapable of application 
until it has been tabulated. Green gives only one test of the accuracy of his approximation, 





166 Approximate confidence limits for components of variance 


that P = 0-97492 when f, = 8, f, = 50, p = 1 and a = 0-975. From equation (11) it can be 
calculated that P for my approximation varies between 0-97500 and 0-97520 when f, = 8, 
fo = 50 and a = 0-975. 


APPENDIX 
© 1, [ megx*(e/ms) 
F =(- p,| arte Po(Mz) dm, (Al) 


in the form P = a+0,//.+c,/f3+..., and in particular to show that a, = 0. 
Now, solving equation (6) for F in terms of g,(F’) = v, say, we find 


F = g3(v) = $L,(1+)+[}L3(1+0)?-L,(L,—L,)}}. (A 2) 


(The positive square root must be taken since v = g,(F')~ F'/L, for large F, or F ~ L,»v for large v.) 
Expanding (A 2) in powers of 1/f, we have 


We wish to expand 


$L"(1 +0)? + (L?/L) (20+%) ¢: 
fll +v) Yt Q——_—~ +O(fe ), (A 3) 


where L = L,, L’, L” denote the successive derivatives of L, with respect to fy! at f, = 0. Writing 
m, = 1+ (so that the expected value of x is zero) we now find, expanding in powers of f>! and 2, 











gz (v) = L(l+v)+ 

















m29q (pims) _ mtFZl J L+% . =i 
p+l l+p — fo L(l+p)? (1+p)? (1+)! 
1 L’ L/*(2p + p*) a 
=< O(fr?, fr2x, fox), A4 
*alaisast Tae [tOUrite he) (AM 
and, expanding P, about L, we find 
a | m 
M29, (p/m) pL pL? pL p's 
P,) 2a i eee © tn x 4 
{ pti + 14p"t ape” * eI+pe” * 2404p)” 
LP pl’, pL'(1+2p)+p'LL’ wi pL’p* + p’'LL'(1+2p)+ tp" L*L’ 2] 
fz L(1+p)* (1+p)? , (1+p)* 
1[ pL’ ee eel ih siti ieee 
ee ) OS®, ft fa 8 a, oh, A5 
mica (+p) re Se ea 


Here p,p’,..., denote p,(L) = P{(L),p;(L), ..., and we note that P,(L) = a. Substituting this expres- 


sion in (A 1), and noting that | x"p,(m,) dmg is the rth central moment of m., we find 





pL? +pL’ 1 
P=a+ ———— + ——— [$p"14(1+p)+4p"L‘+ 2pL’p? 
FAdltpe? * Ftp) [$p"L°(1 +p) + 4p pl'p 
+ 2p’LL’(1 + 2p) +p"LAL’ + 4pL"(1 +p)? +(pL/L) (2p +p*) +4p'L] + Olfz*). (A6) 
This expression can be thrown into a more convenient form by replacing the derivatives of the per- 


centage point L, L’,... by their expressions in terms of p,p’,.... These relations can be obtained by 
expanding the expression 


2) 

i) P,(L, Mg) po(%2) dm, = & (A7) 
0 

in powers of 1/f, and equating coefficients. The same result can be found simply by putting p = 0 in 


(A 6) (when the left-hand side becomes exactly «). Thus, as remarked previously, the coefficient of 1/f, 
in (A 6) vanishes identically, and we have finally 


Pe 0+ ee [ — $p"L*(p + p*) — 2p’ L*p? + (2p + 3p*) 
fH1+p)* P 


lo" TA 374 
+ (antes PE PE) (apo |4ous*. (a8) 


The quantity ¢ defined in §3 is the second term on the right of (A 8) with the factor 1/f2 replaced by 
1/f?. It will be recollected that L = L, is the lower 100a % point of an F distribution with f, and 











by 
ad 





M. G. BuLMER 167 


oo degrees of freedom, or of a y?/f, distribution with f, degrees of freedom; p, p’, ... represent the ordinate 
of this distribution at the point y?/f, = L,and its derivatives with respect to L. The latter quantities can 
be conveniently evaluated using Molina’s (1945) tabies of the Poisson distribution. 

Now consider the case where f, is large. Then the above y?/f, distribution becomes nearly normal, 
with mean 1 and variance 2/f,, so that L is approximately 1—£,(2/f,)*, where £, is the upper 100« % 
normal deviate. If we write f(£,) for the ordinate of the standardized normal curve at £,, we have 
approximatel 

- r P= (df fE), p= (bf) AE) SEQ)» 
where the H, are Hermite polynomials. Making these substitutions in the expression for ¢ provided 
by (A 8), we easily find 12p+p? 
Se eee I +O =). 
8 (1p) EF (Eq) + OF, *) (A9) 
an expression also obtainable from (9). This function has a maximum at p =,/2—1, when it takes the 
sigs maxe = wEf(f) (fi large), (A.10) 


as indicated at the end of §3. This result can also be obtained from equation (10). 


I should like to thank Mr A. M. Walker for helpful discussions; I should also like to thank 
the referee for a valuable criticism of the first draft of this paper. 


REFERENCES 


BaRtTLeETT, M. S. (1953). Biometrika, 40, 306-17. 

DantEts, H. E. (1939). Suppl. J.R. Statist. Soc. 6, 186-97. 

Davis, O. L. (1947}. Statistical Methods in Research and Production. London: Oliver and Boyd. 
GREEN, J. R. (1954). Ann. Math. Statist. 25, 671-86. 

Hurtson, A. (1955). Biometrika, 42, 471-9. 

Motina, E. C. (1948). Poisson’s Exponential Binomial Limit. New York: Van Nostrand. 

WE cu, B. L. (1956). J. Amer. Statist. Ass. 51, 132-48. 


Editorial Note. The accuracy of Green’s solution has been further investigated in a Thesis 
entitled ‘A confidence interval for variance components’ by S. Lipton, presented for the 
Degree of M.Sc. of Liverpool University in 1955. 








[ 168 ] 


MULTIPLE RUNS 


By D. E. BARTON anp F. N. DAVID 
University College, London 


1. The number of ways in which 7, elements of one kind and r, of another can be arranged 
in a line to form a sequence of 2¢ or 2t+ 1 groups was solved by Whitworth (1886, Problems 
193 and 194), and it was probably not new to him. The solution was revived by Stevens 
(1939), Wald & Wolfowitz (1940) and Mood (1940). Sequences in which there are k different 
kinds of elements have also been treated. Whitworth takes three kinds of elements with the 
same number of each while Mood obtains a general solution for the distribution of the 
number of runs of one kind, given k types of element. Using a simple generalization of 
Whitworth’s method we show here how the distributions of the total number of runs can 
be built up for the multiple case from that for two alternatives. The method of the charac- 
teristic random variable is used to obtain reasonably compact expressions for the moments 
of the total number of multiple runs. The assumption of normality for this total number 
is not discussed by us at length, since it obviously follows from the work of Wald & Wolfowitz 
and of Mood. We show, however, an alternative limit which is Poisson. 

2. Starting with the two alternatives case it is assumed that there are 7, white balls and 
r, red balls. If they are arranged in a line randomly, then 7’, the total number of groups 
(or runs) will be made up of ¢ white groups +¢ red groups, or (¢+1) white and ¢ red, or ¢ 
white and (¢+ 1) red. The number in the fundamental probability set is 


e. : 


where r=P,+71s. 
The probability distribution of 7’ is 


PIT = 28} = 27106 ,_,"21C_,/"C,, 





2 
and P{T = 2+ 1} = P{T = 2%} (" 7) : 
Now let there be r, white, r, red and 7, black balls, and suppose it is required to find the 
probability distribution of T = t+ty+ty 
where t, is the number of white groups, t, the number of red and t, the number of black groups, 
respectively. The number of ways in which r; (i = 1, 2,3) can be split into ¢; groups is 


1 
” Ci,-1 
3 
so the total number of ways is i X4..- 
i=1 


We now consider the ¢; as units and look for the number of ways in which they can be 
arranged along a line so that no two like colours are together. Following Whitworth we 
take the ¢, and ¢, groups and put them down in any order. Suppose that there is a total of 
x contacts of the same colour (RR or BB). There will be t, +t,— 1 total contacts so that the 
number of RB or BR contacts will be t,+t,—1—2. Now take up the ¢, white groups. x of 














is re 











D. E. Barton anp F. N. Davin 169 


them will have to go between the RR and BB contacts leaving t, — x to be placed between the 
BR and RB contacts or at either end of the line. This can be done in 


lHytl—2Q 
oe 


ways. The number of ways in which f, and ft, can be arranged to produce t,+t,—1—2 
contacts is the number of ways in which ¢, and ¢, can be arranged to produce t, +t,—2 = G 
(say) groups. The number of ways for this, depending on whether G is even or odd, is 
quoted above. The total number of ways of producing 


T => t, + te + te 
3 
will be then > Th G, Y aeltame al P{G = ty t+tg—a}2MsC,], 
i 


where the inner sum is taken over all possible values of x and the outer sum is over all possible 
3-partitions of 7’. There are restrictions on both x and 7’ in order that the combinatorial 
coefficient shall have sense, but these are obvious in any calculation. The number in the 
fundamental probability set is 


where r is the sum of the r;. The probability of obtaining a given number of runs is just the 
ratio of the two expressions. 


3. The distributions for four and more runs can be built up successively from the 
distribution for three. Let there be r;(¢ = 1, 2, ...,4) with r the sum of the 7;. Let there be 
t;(i = 1, 2,...,4) groups respectively and suppose that 


k 
v(P) = v( Et) 
is required. The number of ways of forming 7' groups is 


k 
"i“"C 4: 
i=1 


v 
k-1 

Arrange T’ = dt; =T-t, 
i=1 


in any random order and let there be x total contacts of self-colours, i.e. x is the total 
number of white-white, red-red, black-black, ..., contacts. Since there are 7” — 1 contacts 
in all there will be 7” — 1 —-2 contacts of different colours. Take the ¢, groups which so far 
have not been arranged. Put 2 of these in the z self-colour contacts and arrange the remaining 
t,—a in the 7” —1—z2 different-colour contacts, or at the ends of the line. The number of 
ways of doing this will be T1270) 

tk—z° 


The number of ways in which 7” elements can be arranged to form 7” — groups is obtained 
from the distribution of runs of k—1 colours and we have 


k 
II r;! k % 
P(T ad Th} 7 ae DDG Bye Pir = Vt.-—2z 
: ei x 








170 Multiple runs 


where the first sum is over all possible k-partitions to J, and the second sum over all possible 
x. The restrictions on both x and the partitions of J, become obvious in calculation and the 
distributions can be calculated surprisingly quickly. Using the formulae of this and of the 
preceding section the distributions for r = 3,...,12 and k = 3,4 were found and are given 


in Tables 1 and 2. More extensive calculations did not appear profitable in the light of 
approximations discussed later. 


4. The moments of 7’, the tota! number of runs of all colours, follow directly from first 
principles. Define a characteristic random variable «, which has the property that it equals 


unity when the balls on either side of the tth gap are the same colour and zero when they are 
of different colours. Define S by 


r-1 
t=1 


The moments of S can now, in principle, be written down. Thus 





r(r—1) 
rere “3 r(r—1) 
i=1 
k 
We shall write generally F,=>r, 
i=1 
so that in this notation r.6(8) = 
rs (Ss ft 
Again &[S(S—1)] =€ ( 5 x, 5 a, 
t=1 t=1 


Because a, is a characteristic random varia: le 
E(x") = E(x), 
but it will be necessary to divide the double sum into two parts: 
r—ir-1 r—i¢—1 


éz U%%, = BS aa +8% Zhe 
‘Wo tw i>2 


The expected values of the products are 


—2 
E (06,0141) => oo ee : 


and E(c,2,) = r(r¢— 1) 7,(7;-1) ‘ ¥ rir; —1) (7:— 2) (74-3) 





rr 1)(r—2)(r—3) Gy r(r—1) (r—2) (r—3) ’ 
there being 2(r — 2) products in which the suffices of the «’s differ by unity and (r— 2) (r —3) 
products in which the suffices differ by two or more. From David & Kendall’s tables of 


symmetric functions the double sums in the expectations can be eliminated and we have 
finally 





® 148) = F,(Fy—2)—2F,, 
By a similar process we obtain the third and fourth factorial moments of S which are 
r°®) jy(S) = Fy(F,—2) (Fy— 4) — 6F,(F,—3) + 10F, + 10F,, 
r® wy(S) = Fy(F,— 2) (Fy—4) (Fy— 6) — 12F,(F,— 3) (Fy—5) + 40F (Fy — 4) 
+ 40F,(F,— 3) + 12F,(F,—3)—84F, — 296F, — 120F,. 








r, the’ 


is in 


This | 
nativ 
and ke 


whe 


Fro 


The 


wh 


Ky 





ible 
the 
the 
ven 
t of 


first 
uals 
are 


_ 3) 
s of 
ave 


D. E. Barton ann F. N. Davin 171 


5. Two approaches to the limit are possible. First fix k, the number of colours, and let 
r, the total number of balls, increase without limit. Then we have that 


S—BJr 
os 


is in the limit a unit normal variable where 


2_A(r-3), FE 2%, 
S ¢(r=1) " rr—1) r(r—1)° 





This result follows by generalizing Mood’s procedure for runs of one colour. As an alter- 
native we can fix the possible numbers of each colour, i.e. we can put an upper bound on r,, 
and let & and therefore r tend to infinity. Under these conditions we have approximately 
HtD |. phiwtD 


Law) (S) = E( : > i Obj Op, hy) = z P 
LFte+...Fiw Gee) 





since the other sums in which some or all of the subscripts are equal will be of lower order 
in r. Accordingly if we put A=Ryr 
= Br, 


then My (S)> 2”, 


which is the wth Poisson factorial moment. This indicates that S tends in the limit to be 
distributed as a Poisson variable with parameter A. A rigorous proof of this can be given 
following the lines already set out elsewhere for moments of a similar structure (Barton, 
1957). We have therefore that as k (and 7) increase without limit the distribution of the total 
number of runs tends to that of Poisson’s binomial limit with parameter F,/r. In practice 
this limit will not be reached until k is large. 


6. Following Aitken (1939) we define the factorial cumulants xy by the relation 
t Ato)! 
kK, = & = Kw» 
i=1 @: 
where x; is the ordinary moment cumulant. The first four of these are 
Ky =Kq, Ke = Katka, Kg = Kg t3KiytKa, Ky = Kt 6k + TK(y + Ky, 


From these relations we may find the factorial cumulants of S: 


a F2 — 2rF, — 2rF, 
(9) = 8, ay = RRO, 


Kig(8) = aGae ay AFI 12rF + 87°F, — 12rFy f+ 2879R, + 10°F, 


The expression for k,(S) is lengthy and we do not reproduce it here. Write 


A=AJr, Ay= Fault, 
whence 
2_ 2A—2A, 1 


(8) =A, Ky) = A sey(8) = + [4 6A,(2A — 3) + 10(A, +.4)]. 
r—l1 (r—1)(r—2) 








172 Multiple runs 


These cumulants simplify considerably if we let 
r,=R (= 8 
for in this case A, = AH, 


Thus for equal numbers of each colour we have 





4 Sat dan aaah _ 2a4A—1) ‘ 622 . At 
24 w= Mw Gage’ "GHG —HEHY Att] 


If we consider a simple binomial expression, say 


(q+p)’, 
where A= rp, 
the factorial moments are 
2 2A8 6A4 
ky =A, Ky =- -* rae ern S 


It will be noted that the leading term in the factorial cumulant of S is the same as the 
factorial cumulants of the binomial in the four cases given. 


7. The similarity between the first four cumulants of the distribution of S, when the 
numbers of each colour are the same, and those of the simple binomial, coupled with the 
fact that two possible distributions under conditions analogous to those of proceeding to 
binomial limits are the normal distribution and Poisson, suggests that a binomial distribu- 
tion may be a useful approximation to the distribution of S. The case r = 12 with four (k) 
colours, each three (R) in number was considered. S can take values 0, 1, 2,..., 8, and 


= 


2 — 9 oe a 
2=R-1, (8) = Fi Ar-3)__ 2% (R-N)0-B) 


K,(S) r2(r—1) r(r—1) ~#(r—1) r—1l1 





Three approximations to the distribution of S are now considered. First we let the 
binomial index equal 8, and fit the binomial 


(q+p)" = ($+2)%, 
the ‘p’ of the binomial being found from 
8p = kK, = 2. 


This is approximation I of Table 3. Secondly, we equate the first two moments of S, i.e. 
R—1 and (R—1)(r—R)/(r—1), to the first two binomial moments, which is equivalent to 
taking a.¢ 2 re 
p= =—, n=—=r-l=l11. 

r—1 1] p 
This is approximation II. Thirdly, we may calculate areas corresponding to a normal curve 
with the correct mean and variance of S. This is approximation III. The approximate and 
exact distributions are given in Table 3, the normal curve area in the first group representing 
the whole left-hand tail. Approximation II—the simple binomial with the correct first two 
moments of S—is clearly the best, but no mistake is likely in a test of significance whichever 
approximation is used. 











- 


im 
and t 
colou 
one n 
not t] 
anotl 


and 1 


givi 


The 
mat 
the 
goo 
cok 
of ¢ 





D. E. Barton ann F. N. Davip 173 


7. Although it is possible to show similarity between the factorial binomial cumulants 
and those of the distribution of S in a simple way when there are equal numbers of each 
colour, the approximations to the distribution of S are so good for this particular case that 
one might expect th: =: to be reasonable for the case when the numbers of each colour are 
not the same. We found that this was so. It is supposed there are 6 balls of one colour, 4 of 
another and 1 of each of two other colours. Thus if we consider approximation I and assume 





ii 


and the binomial is (Pe +75)*. 


Table 3. Comparison of true distribution of S with three approximations (r = 12. [3*]) 





























Ss fo aig TSE Cgs site | eon te a eee gh ing 
er | ethic a 
the Approximation I | 0-1001 | 0:2670 | 0:3115- 0-2076  0-0865+| 0-0231 | 0-0038 | 0:0004 | 0-0000 
Approximation II | 0-1100 | 0-2689 | 0-2987 | 0-1992 | 0-0885+| 0-0275+| 0-0061 | 0-0010 | 0-0001 
Approximation III | 0- 1205 | | 0-2274 | | 0-3042 | 0-2274  0-0952 | 0-0222  0-0029 | 0-0002 eS 
the True | 0-1118 | 02670 | 0:2966  0-2003  0-0903 | 0:0275*| 0-0057 | 0-0008 | 0-0001 
the j | | | | | 
x to } 
ibu- Table 4. Comparison of true distribution of S with three approximations (r = 12. [641*]) 
» (k) —— : — ——— $$ Se eae 
Ss ae 2 3 4 | 6 6 7 4 


} 
| | } | + 
| j “i 





| Approximation I 0-0100 | 0-0624 | 0-1698 e304 | | 0-2567 | 0-1598 | 0-0621 | 0-0138 | 0-018 | 








, | Approximation IT 0-0083 | 0-0563 | 0-1654 | 0-2714 | 0-2697 | 0-1632  0-0564 | 0-0091 | 0-0002 | 
the | | Approximation IIT 0-0126 | 0-0552 | 0-1599 | 0-2723 | 0-2723 ‘ 0:1599 | 0-0552 | 0-0112 | 0-0013 | 
True 0-0054 | 0-0574 | 0-1688 | 0-2749 | 0-2673 | 0-1582 | 0-0567 | 0-0104 | 0-0009 
| | | | 
See Wears eta eke | ‘a 
For approximation IT 
(r—3) r—3 
= >> _ ———} (x —1 =r,(r;— 1) (r; -—2 
6 2 r(r—1) il”; )+( (r—1) *( rr; )?- a r(r; )( i ) 
i.e. | 
tto | = 
| giving the binomial 
} (335 + %x)°99® = (0-51299 + O- 48701)7188. 
irve The fractional value of the index was used to calculate the probabilities. In the third approxi- 
and mation it was assumed that S is normally distributed with mean 3-5 and variance 79/44, 
hing the tail areas being put together in each of the two end groups. The agreement is surprisingly 
two | good considering that r is smali and that there is disparity between the numbers of each 
ver | colour. The results would suggest that for r > 12, whatever the composition of the numbers 


of colours, the normal approximation (ITI) will be adequate for tests of significance. 








174 Multiple runs 


8. An extension of the theory of runs can be used in what we might call a test for the per- 
sistence of type. Assuming the r events of k possible types, the null hypothesis will be that 
these r events are a sample from a multinomial, the probability of the ith type of which is 
p;. This is to say that under H, we assume 








ci 
P( 1,25 -++5 Te) = = — I vis, 
I; 
i=1 
r! 
with each of the : 
II 7;! 
i=1 


sequences being equally likely. The alternate hypothesis, H,, might be that we have a simple 
Markoff chain in which the probability of getting an event of the ith type at any given 
drawing is greater if the immediately preceding drawing also is of the ith type and less if 
otherwise. We may write these transition probabilities 


Pi = P(1+84), 
Diy = DiPs1-OW) (i +)), 
where W = Xp?/(1— Xp). 


Assuming @ = 0 for the first drawing of the sample, the probability of any given sequence 
in which there are 7' groups is 


ke 
(1+0)-?- (1—OW)? TI pi. 
i=1 


If we now consider the conditional distribution under H, for a sequence of given composition 
of numbers of each type, this is equal to 


1—OW\T 
( a) », 


where K is a factor of proportionality and depends only on 1,,...,7;, and 


—OW 
— = (say). 


It follows that 7' is sufficient for ¢ and that the likelihood ratio test for 0 = 0, or equi- 


valently ¢ = 0, is a function of 7’. Thus the use of 7’ is equivalent to the likelihood ratio 
test in this case. 


9. We have been interested in the multiple runs problem for its own sake but, apart from 
the simple stochastic problem of the preceding section, it is perhaps worth while to describe 
one possible analysis of variance application. The use of runs in statistical theory is obvious, 
but there appears room for them in the category which Tukey has aptly described elsewhere 
as ‘quick and dirty’. By quick and dirty tests we would understand tests which are easy 
to apply but probably of low power, so that any effect which is significant using them will 
certainly be more significant with a more sensitive test. On the other hand, if the looked-for 
effect is not significant with the ‘quick and dirty’ test this does not necessarily mean that it 
cannot be picked out by more refined methods. We illustrate the working on an example of 














Fre 


so 





if 


ce 


| 


D. E. Barton anp F. N. Davip 175 








the breaking strength of cement-mortar briquettes (Table 5). We mark tho observations 
in order of magnitude, ties being decided by tossing a coin. The ranking is simply in order 


to enable the runs to be counted more easily. Each group will correspond to a different 
colour. The number of runs is 22 and 


S = 25-22=3, &(S)=4, var(S) =12. 


Assuming S to be normally distributed we see that significance is not achieved, a result whic 
in this case, is confirmed by the orthodox F-test. 


Table 5. Breaking strength of cement-mortar briquettes 














| 
| 1 | 2 3 | 4 | 5 
eel sa! | UR a 

| 

Tension in Ib. | 518 (5) | 508 (3) | 538(12) | 585 (9) | 492 (l 
| 560 (20) | 574 (23) 544 (15) | 540 (14) 506 ( 

538 (11) 528 (6) 554 (18) | 550 (17) 528 (" 

| 510 (4) | 584 (8) 579 (24) | 555 (19) 536 (10 
| 544 (6) | 538 (13) 598 (25) | 567 (21) 572 (2° 





10. Although the analysis of variance runs test has little power this will not be the 
for all applications. We owe the following elegant illustration to E. 8. Pearson. Con 
the falls in the price of shares on the London Stock Exchange during the period 6 Nove# 
to 8 December 1956, both dates inclusive. Five types of industrial activity, A Insurznce, 
B Breweries and Distilleries, C Electrical Equipment and Radio, D Motor and Aircraft, 
E Oil were chosen for study. The closing prices as given in The Times for eighteen businesses 
of each type were taken, and Table 6 shows, for each day, the type of industrial activity for 
which the greatest number of the eighteen showed a fall in price from the previous day. 


In the few cases where there were equal numbers for two types, that type which also’showed 
the fewer rises in price was taken: 


Table 6. Type of industrial activity showing greatest number of falls in price hf shares 


6 Nov. A 13 Nov. B 20 Nov. E 27 Nov. B 4 Yec. C 
7 Nov. A 14 Nov. C 21 Nov. C 28 Nov. E 5.Dec. C 
8 Nov. D 15 Nov. C 22 Nov. E 29 Nov. A § Dec. D 
9Nov. D 16 Nov. C 23 Nov. E »~I Nov. E 7 Dee. C 

10 Nov. A 17 Nov. E 24Nov. E 1Dec. E 8 Dec. B 


We have r = 25 with r, = 4, rg = rp = 3, fo = 7, rg = 8, T= 16 and S=r-T =9. 
From formulae we find 


5 


5 
Fy = Drr,;—1) = 122, Fy = Er,(r;—1) (r,-2) = 582, 
i=1 i=1 


fod, A eo 

r—1) r(r—1) r(r—1) 

a(S) = 1-241 and 4&(S) = 4:88, 

oo, 
l- 


24] > 3 . 





H(8) = a = 1-54106, 


so that 


Tg 








176 Multiple runs 


This normal deviate is significant, and we conclude therefore that the test is picking out 
the fact that during the Middle East crisis there was some persistence from day to day in 
the way in which different classes of shares were affected. 


11. In nearly all statistical applications of the theory of runs the test of significance will 
be one-tailed. For example, in Wald & Wolfowitz’s application of the two-colour distribu- 
tion to the two-sample means test, a small number of runs could be held to indicate a possible 
separation of the population means. The critical region in their case will be the tail where 
T is small or S large. This is also true for the applications given above; for instance, in the 
analysis of variance application a complete separation of the twenty-five observations 
into five groups would indicate the possibility of the five population means being different 
instead of equal under the null hypothesis. We choose therefore the lower tail of 7' or the 
upper tail of S. The result when there is too great an alternation of colour and 7 is significant 
at the upper tail has not so obvious a statistical interpretation. 


REFERENCES 


AITKEN, A. C. (1939). Statistical Mathematics. Edinburgh: Oliver & Boyd. 
Barton, D. E. (1957). J. R. Statist. Soc. B (to appear). 

Moop, A. M. (1940). Ann. Math. Statist. 11, 367. 

Stevens, W. L. (1939). Ann. Eugen., Lond., 9, 10. 

Waxp, A. & Woxtrowrtz, J. (1940). Ann. Math. Statist. 11, 112. 

Wuitwortu, W. A. (1886). Choice and Chance. Cambridge: Deighton Bell and Co. 





Crm Ww 


oe oo 


= 
lor) 


or me OO 


1S Oe 


AQaoaonnk > 


a 


NH Ooo oun», 


ome 





Soe wmernarnaaaia 


_ 


- 
nN 





| 





D. E. Barton anp F. N. Davip 177 
Table 1. Three-colour runs (S = r—T) 





! 
(The probabilities are obtained by dividing the number tabled by the corresponding value of = ) 











IIr;! 
Values of 7 | 
r! | 
a "s % | fea ns 
| 3 4 5 6 7 8 9 10 11 12 | 
| 
= 
2 2 2 909 | 6 18 36 30 
>. "4 60 6 18 26 10 
eo4; 23 30 | 6 18 6 as 
3 2 21! 210| 6 24 ~~ 62 80 38 
$ 3s 1] mole 52 40 18 
so? 4 1065 | 6 24 42 30 3 
. #@. 444.4614. * 12 ee os | 
3 3 21] 560 | 6 30 100 180 170 74 | 
‘ 4 420 | 6 30 90 150 120 24 | 
_— 280 | 6 30 80 90 60 14 | 
ee ae 168 | 6 30 60 60 12 “ 
. <4 els eae SS rl uf 
| 3 3 3 | 1680 | 6 36° 150 360 510 444 174 
| 4 38 2 | 1260 | 6 36 140 310 405 284 79 
; $$ 51 et = SS 240 252 96 6 
4 4 1/1] 630 | 6 36 = 120 180 180 84 24 
5 8 1| 504 | 6 36 ~= 110 160 132 56 4 | 
wee Se oe eee ee ee 
= a 7216 3 30 - a a = 
| 4 3 38 | 4200 | 6 42 202 580 ©1050 = 1234 838 248 | 
|@ #@ 3] 3100) 6 4 192 510 870 894 498 138 | 
| 5 3 2 | 2520 | 6 42 182 470 752 692 332 44 
| 5 4 #1{ 1260 | 6 42 ~~ 162 300 372 252 108 18 
| 6 2 2] 1260 | 6 42 = 152 350 440 240 30 at 
ae oe 840 | 6 42 142 250 240 140 20 oe 
| ; 3 4 360 | 6 42 102 150 60 “en on - | 
= te 909 | 6 42 42 oe “ait me ee _ 
14 4 38 1/11550 | 6 48 266 900 2010 3064 3012 1764 480 | 
15 3 3 | 9240 | 6 48 256 840 1802 2568 2340 1168 212 
5 4 2 | 6930 | 6 48 246 750 1527 1968 1548 702 135 
6 3 2 | 4620 | 6 48 226 660 1220 1360 870 220 10 
5 6 1| 2772 | 6 48 216 480 744 672 432 144 30 | 
6 4 1 +1] 2310 | 6 48 206 450 645 560 300 90 5 
7 3) @ | om | 6 oe 480 690 480 90 ne —_ 
| 7 8 1 | 1320 | 6 48 176 360 390 280 60 ae a 
i a 1 495 | 6 48 126 210 105 — — — 
oe 110 | 6 48 56 si = ~- _ n ~ 
4 4 4 | 34650 | 6 54 342 1350 3618 6894 9036 79388 4320 1092 
| 5 4 3 | 27720 | 6 54 332 1270 3300 5974 7388 5982 2826 588 
| 6 3 3 |18480 6 54 312 1140 2778 4570 5060 3360 #1100 100 
| 5 5 2 | 16632 | 6 54 312 1080 2592 4104 4272 2880 1110 222 
6 4 2 | 13860 | 6 54 302 1030 2388 3620 3550 2130 710 70 
7 3 2 | 7920 | 6 54 272 880 1818 2350 1920 660 50 _ 
6 5 1+! 5544 | 6 54 272 700 1320 1400 1120 540 110 22 
7 4 #41 | 3960 | 6 54 252 630 1008 1050 660 270 30 — 
8 2 2 | 2970 | 6 54 222 630 1008 840 210 _ _ — 
8 3 1 /| 1980 | 6 64 212 490 588 490 140 Joe — — 
9 2 1+! #660] 6 654 ~= 152 280 168 mae Ate _ — — 
1 1 41 132 | 6 54 72 ne — on ~ ™ — — 











178 


Multiple runs 
Table 2. Four-colour runs (S = r—T) 


(the probabilities are obtained by dividing the number tabled by the corresponding value of 


rT: 


tri 





10 


11 


12 





r; 


wm G bo 


ore OFC em & & WO oe F&O & m CW Co bo 


oe OC > 


> S or 


"2 


bo Ww bo bo m bo bo 


bo Ww bo Ww bo 


wr dS wd ww 


wre » WP bo WH & Ww 


em Or bo OP WOR GO OO 


— = bo bo 


bo bo be 


— 


— = bo bo bo & be 


NOR we bw bw wd & 


— 


mm bo bo bo Go & bo SS & Ww 


— 


— — — DO —— & bo 


bo = bo 


bo = bo bo — = 


— =e ee 


—_ i RD ee ee DS DD DS Ge 








ri 
IIr;! 


180 
120 


630 
420 
210 


2520 
1680 
1120 

840 


7560 
5040 
3780 
2520 
1512 


25200 
16800 
18900 
12600 
7560 
6300 
5040 


92400 
69300 
46200 
41580 


34650 | 


27720 
13860 
13860 

9240 


369600 
277200 
207900 
166320 
138600 
110880 
83160 
83160 
55440 
33264 
27720 








2112 
2052 
1992 
1932 
1932 
1872 
1812 
1752 
1692 
1632 
1572 








Values of 7 
7 8 9 10 11 12 
246 
96 
6 
984 864 
684 384 
384 184 
294 54 
2010 2880 1686 
1560 1720 836 
1320 1260 336 
870 660 186 
600 216 12 
3720 7480 8416 4204 
3120 5160 5016 2184 
3330 6210 6066 1974 
2730 4170 3366 1074 
2160 2736 1368 156 
1740 1980 1116 324 
1560 1536 768 96 
6360 16680 27756 27408 12336 
5820 14400 22056 18708 6516 
5070 10720 14256 10848 3566 
4950 11016 14184 8364 1386 
4350 9000 10656 6768 2016 
4200 7896 8484 4704 816 
2910 4176 3384 1584 306 
3210 4920 3480 780 30 
2460 2920 1980 480 20 
10176 33360 74016 109632 98688 41304 
9486 29590 61916 83952 66638 23254 
8796 26100 51216 62892 43128 13464 
8316 23856 44520 50484 30468 6432) 
7896 20580 35616 39312 25428 7524! 
7416 18616 30320 31104 17528 3712 
6726 15966 23820 21384 10818 2322 
6876 17460 27120 22080 7020 540 
5976 13060 17120 12780 4160 340 
4656 8352 9024 6336 2448 504 
4386 7410 7620 4680 1590 150) 











provic 
few m 
has be 
(The « 
will be 
to Ba; 
has rec 

Tof 


with a 
is Parl 
Pa+1,o( 
permit 
of a ar 


where 


App 
values 











—E 








[ 179 ] 


BINOMIAL SAMPLING SCHEMES AND THE CONCEPT 
OF INFORMATION 


By D. V. LINDLEY 
Statistical Laboratory, University of Cambridge 


1. Summary. Methods of sampling a binomial population in order to obtain a prescribed 
accuracy in the determination of the unknown proportion are discussed. The concept of 
information due to Shannon is used, and the relationship between it and the corresponding 
concept of Fisher’s is investigated and used to explain certain features of the sampling 
schemes. 


2. We consider random sampling from a population in which each individual can be 
classed as A or not-A, the unknown proportion of A’s in the population being ¢. The problem 
discussed is how much sampling should be done in order to obtain sufficient knowledge, in 
some sense, about the value of 6. In a previous paper (Lindley, 1956) we have proposed 
a solution using the idea of information, and in the present paper we discuss in detail the 
application of this idea to binomial sampling. 

It is supposed that at any stage of the sampling the knowledge of @ can be expressed by 
a probability distribution p(9). If this is so the quantity 


Ip = 1(p(0)) = | p(0) log p(9) a0 (1) 


provides a convenient measure of the amount of information about 0; indeed, accepting a 
few mild requirements on the properties that ‘amount of information’ should possess, it 
has been shown by Shannon that J, is, apart from an arbitrary multiplying constant, unique. 
(The constant is incorporated in the arbitrariness of the base of the logarithm.) Details 
will be found in the paper already cited. The effect of sampling will be to alter p(0), according 
to Bayes’s theorem, and so to change J,. The rule we propose is: continue sampling until I, 
has reached some prescribed value. 
To facilitate the calculations it will be supposed that p(@) belongs to the family 


Pap(9) = O9-\(1 — 8)? D(a + b)/{T(a) P(6)}, (2) 


with a and b positive. This family of densities has the property that if the prior distribution 
is p,,(9) then the posterior distribution, after a single binomial trial has been performed, is 
Pa+1,0(9) OF Pa,4+1(9) according as the result of the trial was A or not-A respectively. This fact 
permits representation of the sampling schemes on a diagram with axes giving the values 
of a and b. Simple calculations show that 


I(pa(9)) = In [P(a + 6)/{P (a) P(6)}] 
+(a--1)[¥(a)—‘¥(a+b)] + (6-1) [¥(0) —‘¥(a+9)], (3) 
where natural logarithms have been used in (1) and 
V(x) = din I(x)/dz. 


Application of the sampling rule proposed involves the sampling continuing until the 
values of a and b obtained are such that (3) has attained a prescribed value. We defer 


12-2 








180 Binomiajs'sampling schemes and concept of information 


ails of this for a moment in order to note that the integral (1) is not 
ge of description of the parameter values. Specifically if 6 = 4(4) 
h of # then the probability distribution, ¢(¢), of ¢ will be given by 


q(¢) ad = p(0) dd, (4) 
= I(q(¢)) = | a9) log q($) d¢ 


consideration of the 
invariant under a ¢ 
is a monotone func 


and 
‘ (01g) 72 | 


= I,+ [ro (O)log 7 db. (5) 


3. Information about 0. The information about 0 is given by (3). This expression, as it 
stands, is too complicated to be easily understood but considerable simplification is possible 
by use of the standard asymptotic formulae 


In T(x) ~ In,/(27) —a2+ (x—4$)Ina, 
Stirling’s formula, and 
V(x) ~ Ina— 1/(22). 
We obtain I(Pap(9)) ~ 4 1n {(a + b)3/(27ab)} — 4, 


for large values of both a and b. It is, however, worth noting that both the asymptotic 
formulae are remarkably accurate, and that therefore when we refer to large a and b we 
often only mean that they are both greater than 5. It follows that the boundary in the (a, b) 
diagram is a curve of the form (a+b)? = Aab, (6) 


where A is a constant dependent on the amount of information required. Thus if the prior 
distribution is such that the point (a,b) lies to the ‘south-west’ of the curve in the first 
quadrant, sampling is continued until the curve is crossed. The general features of the 
boundary (drawn roughly in Fig. 1) and its form for small a and b have been discussed 
previously (Lindley, 1956). The main point of interest is the manner in which the boundary 
approaches the origin when either a or b is small; the approach is faster than even (6) would 
suggest. This agrees with the ‘common-sense’ feeling that if the same thing continually 
happens, say the sun rises each morning, then we are much better informed than we would 
be if there was known to be even a single non-occurrence. 


4. Information about ¢ = 2arcsin./9. Here d0/dé = {0(1—6)}4, and on evaluating the 
final term in (5) and adding it to J, given by (3) we obtain 


I, = In(T(a + 6)/{T' (a) P(6)}) + (@— 4) [¥(@) — F(a +6)] + (6-4) [¥()—F(a+6)]. (7) 


By use of the asymptotic formulae it can be shown that for large a and b 
I, ~ $n {(a +)/(277)}— 3, (8) 


and hence the boundary of the region required is a+b = constant, that is a fixed sample 
size scheme. When, however, either a or b are small the asymptotic result does not provide 











a gor 
expr 


inste 
amot 
critic 


on e¢ 


The « 
quar 
the s 
if we 
abou 
requi 
if A’ 
acqu: 
dary 
const 
show 


5. 
form: 
0 wh 
still f 
repla 
two | 
deno’ 
near 
migh 
in A 


requi 


or, in 


Th 
prese 
admi 
expo 


wher 
para 
wort: 











D. V. LINDLEY 181 


a good approximation. Consider the situation when a = 1. If 6b is large the asymptotic 
expressions yield I, ~ $inb-}y-1 (9) 
instead of (8). (y is Euler’s constant, equal to ‘Y'(1).) Ifit has been decided to use a prescribed 
amount of information resulting, when a and 6 are large, in a fixed sample of size n, the 
critical amount of information when a = 1 will be attained by a value of b satisfying 


n 
4mnb-ty-1l= ¢In5—% 


on equating (8) and (9). Hence 
b = nfe*7/(27)]. (10) 


The constant in square brackets is about 0-77. In other words if a = 1 only about three- 
quarters the amount of sampling (every observation being not-A) is necessary to obtain 
the same amount of information about 2 arcsin./@ as when a is large. Thus, for example, 
if we are initially ignorant about @ we should presumably take a = b = 1: or if ignorant 
about ¢, a = b = , it makes little difference which. In either case we should obtain the 
requisite information about ¢ more quickly if every trial resulted in not-A than we would 
if A’s and not-A’s were mixed. This is the same phenomenon that was observed when 
acquiring information about @ but here it is not nearly so striking. The constant size boun- 
dary is adequate for values of a (or symmetrically of b) greater than 5. Thus when a = 5 the 


constant factor corresponding to that in (10) is 0-96. The general form of the boundary is 
shown in Fig. 2. 


5. Information about wy = In {9/(1—6)}. The consideration of a constant amount of in- 
formation for 2 arc sin ,/@ instead of 0 is equivalent to asking for more information about 
# when it is near 0 or 1 than when it is near }. Such a tendency can often be exaggerated 
still further with profit. Suppose we imagine the continuous range of values of @ (0 <@< 1) 
replaced by a finite set of values so chosen that it would be important to distinguish between 
two neighbouring values of @ but finer distinctions would be unimportant. Then if AQ 
denotes this minimal distinction it will often happen that A@/@ would be constant for 0 
near 0; that is the percentage accuracy is fixed. Over the whole range of 0, A@/{0(1—4)} 
might be constant, a form which reduces to the previous for small # and which is symmetrical 
in A and not-A. Let us find a trensform, y, of 0, such that the same distinctions in y are 
required over its entire range; then 


, AO 
Aw = 0(1 —0) > 
or, in passing to the limit, 
y= In {0/(1 —4)}. (11) 


The same transformation arises naturally in another way, though its connexion with the 
present study is not clear and may be accidental. It is well known that the only distributions 
admitting a single minimal sufficient statistic for all sample sizes are those belonging to the 
exponential family, that is with likelihood of the form 


H (x) G(s) ew”, 


where u is the minimal sufficient statistic, a function of the sample values x, and y is the 
parameter. The binomial distribution belongs to this family with y = In{@/(1—6)}. It is 
worth noting that for any number of repetitions of a general binomial sample scheme the 








182 Binomial sampling schemes and concept of information 


statistics La, and Xb, are jointly sufficient. In most cases they will also be minimal sufficient 
but in exceptional cases this will not happen. Thus in the fixed sample-size scheme and the 
Haldane inverse binomial scheme (see below) 2, is alone minimal sufficient. This is because 
Xa; can, in those two special cases, be deduced from the sizes of the schemes (which are not 
statistics) and, in the former case, Xb;. 




















b b 
. $ 
(1,1) ° (1,3) 2 
Fig. 1 Fig. 2 

b 1 

' v 
1 
| 
| 
1 
' 
' 
! 

— poccc- 
U 
U 
! 
I 

(1.1) . 

Fig. 3 


Figs. 1-3. Sampling schemes for providing a prescribed amount of information 
about the parameters indicated. For the notation see text. 


Considering, then, schemes which produce constant information for y. We have 
d6/dyy = 0(1—@) and on evaluating the final term in (5) and adding it to J, given by (3) 
we obtain 


f, = In(T'(a +6) /{T (a) P(6)}] + a[ F(a) — F(a + b)] + O[ (6) — F(a + 6). (12) 
The asymptotic results used before yield 
and hence the boundary of the region required is 
ab " 
to N, aconstant, (14) 


a rectangular hyperbola with axes a = N, b = N: only the portion in the quadrant a> WN, 
b>N will be meaningful in the present situation. This obtains provided a and b are both 
greater than about 5. Now, from (12), we have 


0l,/0a = a¥'(a) — (a+b) ¥"(a +5), 








whicl 


when 
that 1 
with 
appre 
inforr 
seem 
In 
woul 
conve 
(2) w 
might 
seriou 
by H: 
with | 
of va 
requi: 
in lar 
Iti 
more 
a and 
Th 
easily 
in the 
ample 
givin; 
The ¢ 
neede 
invol 
decid 
the id 
other 
been 
schen 


6. 
struc’ 
previ 
as th 
Essex 


is asy 
the lo 
with 1 











ee 


D. V. LINDLEY 183 


which is positive since $ [x‘¥’(x)]<0. Hence the information about y always increases 


when an additional observation is incorporated; with the two previous parameters we saw 
that this did not necessarily happen for small a and b. Thus the phenomenon connected 
with constant repetition does not obtain here. From this it follows that (14) is a good 
approximation to the boundary provided N > 5, that is provided the required amount of 
information is sufficiently large. The general form is shown in Fig. 3. The case N < 5 does not 
seem of sufficient importance to warrant a numerical investigation. 

In situations where it is appropriate to consider constancy of information about y, we 
would often have prior knowledge to suggest that 0 is small, and it would probably be 
convenient to express this prior knowledge of @ by a distribution belonging to the family 
(2) with a small and 6 large. If it then happened that a< N and b> WN the boundary (14) 
might conveniently be replaced by the asymptote to the hyperbola, namely a = N, without 
seriously affecting the approximation. This modified sampling scheme is that advocated 
by Haldane (1945) in just the cireumstances we have been considering. He was concerned 
with producing a sampling scheme which would yield an approximately constant coefficient 
of variation for the estimate of 6. It is easy to see that for small 0 this is equivalent to 
requiring the standard deviation of y to be constant when y is large. That this will be so 
in large samples follows immediately from the considerations in the next section. 

It is possible to consider the transformation 7 = In 0, constant information about which 
more closely resembles the situation considered by Haldane. This causes trouble for small 
a and 6 but for large values the boundary is again of the form a = constant. 

The three sampling schemes described here are quite different in character, as may most 
easily be seen by a glance at the three figures. The first and third represent the extreme cases: 
in the former one is concerned to know 9, whatever its value, to a fixed accuracy, for ex- 
ample to two decimal places; in the latter the proportional accuracy is constant. The second, 
giving approximately the fixed sample-size scheme, represents an intermediate requirement. 
The choice of which to use depends essentially on the type of knowledge about 6 that is 
needed. It is interesting that no loss function is involved in the approach, yet this choice 
involves considerations which are similar in form to those which would be advanced in 
deciding on a loss function. The reason is that Shannon’s integral (1) expresses uniquely 
the idea of information, the only choice left, so to speak, is the d@ in the integral. On the 
other hand the loss function approach is distinct from one based on the integral (1), as has 
been demonstrated previously (Lindley, 1956). The reader can easily construct other 
schemes for special situations, using the method given here. 


6. The concepts of information. The sampling rules introduced here have been con- 
structed using the concept of information due to Shannon. Similar schemes that have 
previously been suggested have utilized the sampling variance of an efficient estimator 
as the criterion whereby we can judge whether @ has been found to sufficient accuracy. 
Essentially, then, the concept of information used has been Fisher’s, whose expression 

4 no) PHO), 


15 
a0? ty 


is asymptotically equal to the inverse of the sampling variance referred to. (L(x | 0) denotes 
the logarithm of the likelihood of the sample point x and &, denotes the expectation taken 
with respect to x.) Now these two concepts of information are totally different in character. 











184 Binomial sampling schemes and concept of information 


There are two main differences. First, Shannon’s concept involves the introduction of a 
prior distribution for the unknown parameter whereas Fisher’s does not. Secondly, Sha.. 
non’s concept, when applied to construct a sampling scheme, uses the likelihood of the 
sample point x obtained (in Bayes’s theorem), whereas Fisher’s concept uses the whole 
probability distribution of z for fixed 0; that is it involves not merely the x that was obtained 
but also the x’s that might have been obtained. The first point is to the advantage of Fisher’s 
notion, but once the idea of a prior probability for 9 has been admitted the second point 
makes Shannon’s concept simpler to apply. 

Now despite these great differences we notice that the resulting sampling schemes are 
closely similar. Thus it is well known that for samples of a fixed size the amount of Fisherian 
information about ¢ is constant. In other words we can acquire a prescribed amount of 
Fisherian information about ¢ using a fixed sample-size scheme. But we have just seen that, 
at least when a and 6 are not too small, the same result obtains with Shannon’s J, for .4%. 
Again the Haldane scheme appears in just the same circumstances as it did originally when 
J, was used. We now offer an explanation of why this is so. 

Bayes’s theorem says that 


p( | x) = p(x | A) p(A)/p(~), 


and hence, on taking logarithms, 
L(O\ x) = L(x | 0)+ L(A) — L(2). 


Differentiation twice with respect to 0 gives 


_@L(O\x)_ L(x) _ (0) 6) 
06? 06? og? * 

Suppose that p(@ | x) is approximately a normal distribution, as it is in our study provided 
a and b are both large, then J(p(@ | x)) is easily verified to be approximately — }1n (27e0°), 
where o? is the posterior variance of 0, given x, and will, of course, depend on x. It should 
be distinguished from the variance of an efficient estimator of 0. But, again, in view of the 
approximate normality, o-* = —0?L(6 | x)/00?, and hence if we choose a sampling scheme, 
or equivalently a set of points x, such that I(p(@ | x)) is constant it will follow that the left- 
hand side of (16) is constant. For such a sampling scheme if we take expectations over the 
sample space we shall have, from (16), 





_ L(A |x) _ é O*L(a|0)\ L(A) 
i i i a i 
The second term on the right-hand side is negligible in large samples and hence 


o?L(6 | x) . 
pees = Sp, (17) 
0? 

where 4, denotes Fisher’s information which is therefore constant. The same argument 
persists with ¢ for 0. This demonstrates the equivalence of the two concepts when applied 

to schemes yielding constant information. 
This result establishes the asymptotic equivalence of the posterior variance of # and the 
variance of an asymptotically efficient estimator of 0. It is therefore possible to construct 
the schemes using this latter variance with the asymptotically efficient estimate replacing 











17) 


ent 
lied 


the 
‘uct 
sing 








D. V. LINDLEY 185 


the unknown value of @. Thus the variance of an asymptotically efficient estimator of 0 
itself is 0(1—0)/(a+b); replacing 0 by a/(a+b) yields ab/(a+b)’, which if held constant 
produces the first of our schemes. 

On the other hand in small samples -% is a difficult concept to handle. Primarily this 
is because the whole sample space enters into its evaluation: in the binomial situation it is 
necessary to conduct a tedious enumeration of sample paths. But even when this has been 
done the solution of the equation .4, = constant presents considerable difficulties. The 
statistician who refuses to recognize and use prior distributions may still find our method 
attractive because of this virtue of approximating to another approach. We conclude the 


paper by discussing the roles of prior distributions and Shannon’s concept in the practical 
application of the schemes. 


- 


7. We first note that the prior distributions are not very obtrusive in the present study. 
They do not effect the boundary of the sampling scheme, but only the point in the (a, b) 
diagram from which sampling should start. In all practical applications the present author 
has met the experimenter has a fair idea of the value of 0 and this knowledge can usually 
be expressed approximately by a member of the family of distributions given by (2). For 
example if he says 6 will fairly certainly lie between 10 °% and 20 % then a and b can be found 
such that p(@) is only effective over this range. (If fairly certain is interpreted as 95% 
certain, then calculations suggest a = 30, b = 175.) Even if there is considerable prior 
knowledge, as in this case, experimenters often wish to be able to answer the question: 
‘What has this experiment to say about the value of 6?’ They wish to make a statement 
which does not depend on prior knowledge. Thus Haldane’s scheme was devised for cases 
where @ is known to be small, yet the standard deviation is used to summarize the results 
and the prior knowledge ignored. In other words, prior considerations enter into the design 
but not the analysis. It is doubtful if the above question can be answered satisfactorily 
within a logically consistent framework. Fisherian methods appear to provide an answer 
but their logical basis is not clear and it is possible to produce contradictions (Mauldon, 
1955; Lindley, 1957). It would be possible to provide an answer if a meaning could be given 
to the phrase ‘ignorant about 0’. Although numerous attempts have been made to do this, 
utilizing the ideas of invariance, none have been conspicuously successful and again con- 
tradictions appear (Stein, 1955). Possibly ignorance about @ is as far distant in the past as 
certainty about @ is into the future. In the three situations discussed in the present paper it 
does not seem too unreasonable to interpret the phrase ‘ignorant about the parameter’ to 
mean that the parameter has a uniform distribution over its permissible range of values. 
Such an interpretation is equivalent to using the following prior distributions: for 0, 
a=b=1; for ¢d,a=b =}; for y, a = 6b = 0. (The last is an improper distribution of the 
type considered by Jeffreys (1948).) It will be noted that even between the two extreme 
cases (9 and ys) there is only a difference of effectively two observations: for observations 
A and not-A would suffice to alter the prior distribution from a = b = 0 toa = b = 1. The 
equivalent of two observations is hardly likely to be serious in practical problems. Con- 
sequently it may be possible to meet the experimenter’s requirement by choosing a and b 
somewhere around these small values even if the prior knowledge he really possesses suggests 
otherwise. 

Another practical consideration involves the determination of the constant representing 
the amount of information required, for example the value of N in (14). This is easily done 











186 Binomial sampling schemes and concept of information 


in the present context. For situations likely to be met in practice the relationship between 
the information J, and the posterior variance of 0, o7(0), 


I, = —4n {27e0°(4)} (18) 


can be used to evaluate the constant in terms of o7(4), a more familiar concept. For example, 
if y is being considered, we have from (13) and (18) 


$ In {27e(a + b)/(ab)} = 41n {27e0?()} 


and hence o°(yr) = (a+5)/(ab) = 1/N. In other situations a more delicate study of the 
posterior distribution involving more than the variance may be necessary. 


REFERENCES 


Hapang, J. B.S. (1945). Biometrika, 33, 222-5. 

JEFFREYS, H. (1948). Theory of Probability, 2nd ed. Oxford: Clarendon Press. 
LinpDtey, D. V. (1956). Ann. Math. Statist. 27, 986-1005. 

Linvtey, D. V. (1957). J. R. Statist. Soc. B (to appear). 

Mavtpon, J. G. (1955). J. R. Statist. Soe. B, 17, 79-85. 

Sretn, C. (1955). Ann. Math. Statist. 26, 157. 








ween 


(18) 
iple, 


the 








[ 187 ] 


A STATISTICAL PARADOX 


By D. V. LINDLEY 
Statistical Laboratory, University of Cambridge 


An example is produced to show that, if H is a simple hypothesis and x the result of an 
experiment, the following two phenomena can occur simultaneously: 

(i) a significance test for H reveals that x is significant at, say, the 5% level; 

(ii) the posterior probability of H, given x, is, for quite small prior probabilities of H, 
as high as 95%. 

Clearly the common-sense interpretations of (i) and (ii) are in direct conflict. The phenom- 
enon is fairly general with significance tests and casts doubts on the meaning of a signi- 
ficance level in some circumstances. 

We begin by giving the mathematical derivation of the example and later comment on 
it and the assumptions involved. Let (2,,x,,...,2,) be a random sample from a normal 
distribution of mean @ and known variance o?. Let the prior probability that 0 = 0), the 
value on the null hypothesis, be c. Suppose that the remainder of the prior probability is 
distributed uniformly over some interval J containing 0). We shall deal with situations 
where %, the arithmetic mean of the observations, and a minimal sufficient statistic, is 
well within the interval J. The posterior probability that 0 = 0), in the light of the sample, 


can be evaluated; it is é = cexp[—n(%—0,)*/(20%)]/K, (1) 
where K = cexp[—n(% —6,)?/(207)] + (1— | exp [ — n(%—6)?/(207)] dd, 
I 


by Bayes’s theorem. In virtue of the assumption about % and J the integral can be evaluated 
as o 4/(27/n). 

Now suppose that the value of % is such that, on performing the usual significance test 
for the mean 0, of a normal distribution with known variance, the result is significant at 
the « percentage point. That is, ¥ = 0)+A,o/,/n, where A, is a number dependent on « 
only and can be found from tables of the normal distribution function. Inserting this value 
for % in (1) we have the foliowing value for the posterior probability that 0 = 0, 


€ = ceHia/{ee-Hi + (1—c) o4/(2m/n)}. (2) 


(Note that %— 6, tends to zero as n increases so that % will lie well within the interval J for 
sufficiently large n.) From (2) we see that as n->oo, €>1. It follows that whatever the 
value of c, a value n can be found, dependent on c and « such that 

(i) % is significantly different from 0, at the a % level; 

(ii) the posterior probability that 0 = 0, is (100—«) %. 

This is the paradox. The usual interpretation of the first result is that there is good reason 
to believe 0 + 0,; and of the second, that there is good reason to believe 0 = 0). The two inter- 
pretations are in direct conflict, and the conflict may apparently be made even stronger by 
remarking that the (100—.«) °% confidence and fiducial intervals for 4 just exclude 0 = 0). 
With « = 5 we are 95 % confident that 0+0,, but have 95 % belief that 0 = 6p. 

In commenting on this analysis, let us first consider the assumptions involved. Many 
significance tests involve situations in which the test criterion is asymptotically normally 








188 ; A statistical paradox 


distributed with known variance, as is in the example, and therefore the sample con- 
sidered is in no way unusual. The only assumption that will be questioned is the assignment 
of a prior distribution of any type, and, in particular, of the form chosen. A paradox will 
only have been generated if we can show there exist situations where (a) a prior distribution 
of this form is reasonable, and (5) a significance test of the ‘tail-area’ type is commonly used. 
Let us first consider the assignment of any prior probability. The argument for the use of 
prior probabilities has been put forward very cogently by Jeffreys (1948). His arguments 
have, to my mind, been reinforced by those of Ramsey (1931) and, more especially, Savage 
(1954). Savage’s main contribution is as follows: he lays down certain axioms that a man 
should follow if he is to act in a ‘rational’ way, and defines a rational man to be a man who 
acts according to these axioms. The latter are quite mild in their form and would surely 
be agreed to by most statisticians. Savage then shows that a rational man must act as if 
he/had a prior probability distribution and (if relevant) a utility function. It does not follow 
frqm this that any statistical inference need make overt mention of a prior distribution, 
butt it does follow that no inference procedure should grossly contradict the existence of 
a prior distribution. (A mild contradiction may be allowable in the interests of simplicity.) 
Another way of looking at this result is to say that a probability distribution is a satisfactory 
measure of one’s convictions about several hypotheses. For example, if to-day we say that 
our prior belief in one hypothesis is } it will mean the same as saying to-morrow that our 
prior belief in a different hypothesis is $; just as a yard of material to-day measures the same 
as a yard of material to-morrow. If we are to use a significance level in a similar way, as 
Fisher (1956, p. 43) has suggested we can, and most statisticians do, we must establish a 
similar comparison property. 5°%, to-day must mean the same as 5% to-morrow. Our 
example, we claim, shows that it need not. 

So much for the general question of introducing a prior distribution. We now consider 
the particular form used in deriving the paradox. We first note that the phenomenon would 
persist with almost any prior probability distribution that had a concentration on the null 
value and no concentrations elsewhere. For example, if there is an amount c at 0 = 0, and 


the rest is distributed throughout J according to a density p(@), where | p(0)d0 = 1-e, 
I 


then if p(@) is bounded it is easy to show, for example, by a steepest descent argument 
applied to the integral corresponding to that in (1), that ¢ still tends to 1. It is sufficient 
that p(@) does not tend to infinity too rapidly as 0 tends to 0. It is, however, essential that 
the concentration on the null value exists, and it is this that has to be considered. Again 
Jeffreys (1948) has discussed the point. Briefly, one argument is that the singling out of 
the hypothesis ? = 0, to be tested is itself evidence that the value 0, is in some way special 
and is likely therefore to be true. We should like to give two examples where this seems 
unquestionably correct. The first is in genetics where @ is the linkage parameter between 
two genetic factors. If there is no linkage 0 = 0, = }, and we are concerned with developing 
a test to determine if there is any evidence for linkage. Now in this situation there is a 
considerable amount of prior knowledge. For it is known that there is linkage if, and only 
if, the two genes lie on the same chromosome. Consequently if there are n chromosomes of 
approximately equal length, and if it seems reasonable to suppose that the gene is equally 
likely to be anywhere along the chromosomes’ lengths, then it seems reasonable to suppose 
a prior probability of the order of (n — 1)/n that the value of @ is }. The particular numerical 
value of the prior probability is not so important here (though we note it is rather large) 





os 








as is tl 
Asecc 
where 
other 
centré 
powel 
preju 
exam 
near t 
comn 
validi 
This 
We 
Savay 
sion ¢ 
to be 
initia 
tribu 
does 
ment 
state 
abou 
exan 
Tl 
theo 
leve’ 
mea 
circl 
we I 
proc 
not 
to-d 
A 
post 


whe 


the 
for 
the 
the 
are 
cor 
one 
che 











D. V. LINDLEY 189 


as is the fact that @ = } is in a singular position and will arise for most positions of the genes. 
Asecond example arises in the telepathy experiments carried out by Soal & Bateman (1954), 
where, if no telepathic powers are present, the experiment has a success ratio of 6 = 1, 
otherwise 0++4. A significance test for telepathy therefore should assign to 0, = 4 a con- 
centration of probability equal to one’s prior belief that the subject has not got telepathic 
powers. This example is perhaps not as convincing as the genetical one because of the 
prejudices that exist in connexion with extra-sensory perception. My point in both these 
examples is that the value 6, is fundamentally different from any value of 6 +4), however 
near to 7, it might be. Unquestionably there exist situations (perhaps they are the more 
common) in which this is not so; where we are interested in testing the approximate 
validity of the null hypothesis, such as that the treatment has no (or very little) effect. 
This point has been discussed by Hodges & Lehmann (1954). 

We now consider the paradox in these situations where the prior probability exists (by 
Savage’s argument) and has a concentration on the null value. We first note that the expres- 
sion of it in terms of fiducial or confidence limits used above is unjustified. The limits purport 
to be statements made about the value of @ in the light of the experimental result when 
initially nothing is known about or independent of knowledge of 0. The type of prior dis- 
tribution used here (suggested by the practical circumstances of the problem) certainly 
does not correspond to ignorance about 7. Thus we should not be surprised at the disagree- 
ment. The paradox merely serves as a warning that the confidence or fiducial type of 
statement should only be used in those circumstances where one is truly ignorant 
about the parameter. We have argued that this is not so in the telepathy or genetical 
examples. 

The conflict between statements of a significance level and statements based on Bayes’s 
theorem remains. Now in our example we have taken situations in which the significance 
level is fixed because, as explained above, we wish to see whether its interpretation as a 
measure of lack of conviction about the null hypothesis does mean the same in different 
circumstances. The Bayesian probability is all right, by the arguments above; and since 
we now see that it varies strikingly with n for fixed significance level, in an extreme case 
producing a result in direct conflict with the significance level, the degree of conviction is 
not even approximately the same in two situations with equal significance levels. 5°% in 
to-day’s small sample does not mean the same as 5 % in to-morrow’s large one. 

An alternative interpretation of the paradox was suggested to me by Prof. Barnard. The 
posterior probability ¢, given by (2), may be written 


c= fnl{efn + (1—c)}, 


in 
where fn = lees) ei, 


the likelihood of 6, on the evidence of the sample. Clearly f,, > 00 as noo, A, fixed. Hence 
for fixed significance level the likelihood of the null hypothesis increases indefinitely with 
the sample size. This appears to me to demonstrate, without reference to prior probabilities, 
the unsoundness of the suggestion that significance tests depend on the disjunction: either 
arare chance has occurred or the null hypothesis is false (Fisher, 1956, p. 39). For the chance 
considered in a significance test is the chance of the observed event and other more extreme 
ones. The chance of the observed event is measured by the likelihood function. These two 
chances behave quite differently. In fact, the paradox arises because the significance level 





190 A statistical paradox 


argument is based on the area under a curve and the Bayesian argument is based on the 
ordinate of the curve. However, the above interpretation through the likelihood involves 
no mention of alternative hypotheses which seem basic to any approach to the problem. 

The other approach to significance testing, due to Neyman & Pearson, does envisage the 
use of alternative hypotheses and hence appears to give a reason for using the tail area 
because this region is the best one in which to reject the null hypothesis at a specified level 
of significance. Therefore the occurrence of an observation in the region is an unusual event 
on the null hypothesis and less unusual on some alternative hypotheses. But the theory 
does not justify the practice of keeping the significance level fixed, nor does it take account 
of the fact that when the observation has been made we know, not that the point has fallen 
in the region of significance, but that it has fallen exactly on the edge, and the likelihoods 
under the null and alternative hypotheses seem the relevant quantities to compare. 

The paradox is not, in essentials, new, although few statisticians are aware of it. The 
difference between the two approaches has been noted before by Jeffreys (see, in particular, 
1948, Appendix), who is the originator of significance tests based on Bayes’s theorem and 
a concentration of prior probability on the null value. But Jeffreys is concerned to emphasize 
the similarity between his tests and those due to Fisher and the discrepancies are not 
emphasized. The same phenomenon was noticed by Lindley (1953) in decision theory studies, 
and some computations by Prof. Pearson in the discussion to that paper emphasized 
how the significance level would have to change with the sample size, if the losses and prior 
probabilities were kept fixed. (The discussion based only on the latter quantities is mathe- 
matically equivalent to one in decision theory language with zero-one losses.) The present 
note considers the situation where the significance level is fixed and the variation in posterior 
probability is evaluated, rather than the other way round. 

The concept of a significance level has been used very successfully in practical problems 
of inference. One might now ask how this has come about. The answer has already been 
given by Jeffreys in the appendix already cited. Essentially it is because ¢, as given by (2), 
tends to unity very slowly and, for moderate values of n, ¢ may be less than c at a prescribed 
significance level and the two concepts be in reasonable agreement. Let 


A = ce-#i/(1—c) ,/(27), 
then ¢ = A/(A+oa/,J/n), (3) 


and ¢->0 as a/,/n->oo. Hence in a small experiment, significance at 5° may give very 
strong reasons to doubt the null hypothesis. A numerical example is informative. Suppose 
we take c = 4 and use a two-sided test at 5 % significance so that A, = 1-96; then A = 0-0584 
and the table gives the value of ¢ for different values of t = n/o?. If o = 1, t = n, and we 
see that for small samples (n < 10) the probability of #, has decreased appreciably from its 
initial value of 4, giving cause to doubt the validity of the null hypothesis. For medium 
samples (10 << 100) the probability has only decreased a little, so that although we are 
not as confident as we were initially about the null hypothesis, our doubts are not great. By 
the time n has reached a value about 300 ¢ is equal to c; the experiment, despite its 5% 
significance, has not altered our belief in the null hypothesis at all. To reach the strong 
contrast put forward in the paradox it would be necessary to take n about 10,000. Of 
course if o is smaller then smaller samples will suffice. For example, if we apply these 
numerical values to the Soal & Bateman problem (i.e. use the normal approximation to 
the binomial) we have o? = 3.4 = 0-16, and a sample of size about 48 has é equal to the 

















origin 
signif 
on th 
is clez 
ment 
mode 
may 
0-1 
10-4 














D. V. LinDLEY 191 


original value of 4. An experiment of this type with a run of forty-eight trials which is 
significant at 5°% would not alter our views on telepathy if initially we had an open mind 
on the problem. The normal approximation is not adequate for samples as low as 10, but it 
is clear that only such small ones would increase our prior belief at all noticeably. An experi- 
ment of 1600 trials would raise our belief that telepathy did not exist to 95%; quite a 
moderate size in comparison with the 37,100 trials carried out with Mrs Stewart. The reader 
may be interested to know that with c = } the posterior probability of the null hypothesis 
6 = + in the light of the experiments with Mrs Stewart (9410 successes) is of the order of 
10-149, The evidence for Mrs Stewart’s telepathic powers is rather strong. 


t c t ¢c 
1 0-055 600 0-589 
2 0-076 800 0-623 
3 0-092 1,000 0-649 
4 0-105 2,000 0-723 
5 0-116 4,000 0-787 
10 0-156 6,000 0-819 
20 0-207 8,000 0-839 
40 0-270 10,000 0-854 
60 0-312 20,000 0-892 
80 0-343 40,000 0-921 
100 0-369 60,000 0-935 
200 0-453 80,000 0-943 
300 0-503 100,000 0-949 
400 0-539 oe) 1-000 


An apparent advantage of the significance level statement is that it does provide some 
sort of assessment of the truth of the null hypothesis using only the evidence provided by 
the experiment. It is, in effect, a convenient (though possibly misleading) summary of 
what the experimental result has to say about the null hypothesis. A similar assessment is 
available in a Bayesian analysis through the likelihood function. In the situation con- 
sidered here the function is proportional to 


J (5) exp {— 3n(z—9)?}, 


regarded as a function of 0. This, unlike the single number expressiny the significance level, 
is a function and is therefore more difficult to understand. A reduction to a numerical value 
is possible provided the assessment of prior probabilities conditional on 0 + 0, is made. For 
example, if @ is uniform in the interval J in these circumstances, then 





" . 
exp [—n(%—0,)2/207] / | exp[—n(@—8)"/20%] 40 = J (=) exp = 

is the quantity by which the prior odds, c/(1—c), in favour of #) must be multiplied in order 

to obtain the posterior odds, ¢/(1 —¢). This single value, or its logarithm, might be an accept- 

able substitute for the significance level. It is numerically equal to Jeffreys’s K, since he 

supposes c = 4. 

The paradox serves to explain one puzzling feature of tests based on Bayes’s theorem. 
Suppose the experimenter has continued sampling randomly until he has reached a result 
which is, using a fixed-sample size significance test, significant at some prescribed signi- 
ficance level «. That is, he has taken a sample (2,, 2%», ...,%,) such that 7 = 0)+A,¢/,/n. 





192 A statistical paradox 


It is easy to show, by the law of the iterated logarithm, that this will happen with probability 
one whatever the value of 0. Then the experimenter has, of course, cheated if he quotes his 
result as being significant at «%, though, if the distribution theory were known, a valid 
significance test could be made. But it would not be that appropriate to a sample of fixed 
size n. On the other hand, it is easy to see that the likelihood of the observations (x,, 9, ...,,,) 
does not depend on the particular sequential stopping rule used and is, therefore, equal to 
the likelihood the experimenter would have obtained if the same sample had been reached 
by taking a sample of fixed size n. It follows that any significance test based on Bayes’s 
theorem does not depend on the sequential stopping ruie used, at least amongst a wide class 
of such rules. In the extreme case the experimenter can go on sampling until he has reached 
the significance level a, and yet the fact that he did so is irrelevant to a Bayesian. In 
telepathy this is known as ‘optional stopping’: stopping when the results look striking; 
striking, that is, on a significance level criterion. The explanation is now clear. If 0+ 6, the 
optional stopper will reach his desired point for small n and ¢<c. On the other hand, if 
@ = 0, the value of n will be larger and ¢ >c. (These are average results, of course, naturally 
sometimes mistakes will be made.) The value of ¢ is just what one would expect in the two 
cases and we see that the Bayesian will not on the average be in error in ignoring the stopping 
rule. It should now be possible to give a reliable assessment of those results in telepathy 
which have had objections raised against them on the grounds of optional stopping. 


I am much indebted to Profs. Pearson and Barnard for helpful comments on the first 
draft of this paper. 


REFERENCES 


FisHEer, R. A. (1956). Statistical Methods and Scientific Inference. Edinburgh: Oliver and Boyd. 
Hopces, J. L. & Leumann, E. L. (1954). J. R. Statist. Soc. B, 16, 2€1-8. 

JEFFREYS, H. (1948). Theory of Probability, 2nd ed. Oxford: Clarendon Press. 

Linptey, D. V. (1953). J. R. Statist. Soc. B, 15, 30-76. 

Ramsey, F. P. (1931). The Foundations of Mathematics. London: Routledge and Kegan Paul. 
Savacg, L. J. (1954). The Foundations of Statistics. New York: Wiley. 

Soat, 8. G. & Bateman, F. (1954). Modern Ex~riments in Telepathy. London: Faber and Faber. 





——— 








in 
spree 
was | 
two | 
h inf 
r, an 
first. 
and | 
tion | 
let tl 
and ¢ 
cons! 


in th 


If th 
we h 


q( 


whe! 
This 


with 











[ 193 ] 


STOCHASTIC CROSS-INFECTION BETWEEN TWO 
OTHERWISE ISOLATED GROUPS 


By H. W. HASKEY 


South-east Essex Technical College* 


1. Bailey (1950) and Haskey (1954) have considered, from a stochastic standpoint, the 
spread of a mild infection among a single group of people into which one infectious individual 
was introduced, the group being isolated. The following is a generalization: There are now 
two groups of people, one with n, individuals and the other with n,. At time t=0 we have 
h infectives introduced into the first group. At time ¢t we suppose there are, respectively, 
r, and r, uninfected susceptibles remaining. Let the chance of one of the susceptibles in the 
first group being infected in time dt by an infective in the same group be ar,(n, +h—r,) dt; 
and by an infective in the second group be b’r,(n_.— 12) dt. The total chance of a fresh infec- 
tion is then the sum of these quantities. Similarly, for the susceptibles in the second group, 
let the chances of infection from infectives of the same or of the other group be br,(n. — 1.) dt 
and a’r,(n, + —1,)dt. Writing p(r,, 72) for the probability of 7,, r. susceptibles at time ¢ and 
considering the possible changes, 


(ry +1,7%2)>(%1,7%2); (Ty %2+ 1) > (%1.%2). (71 72) > (Mv “2) 


in the ensuing time dt, we have, events in the groups being regarded as independent, 


dp(r,,7 
Pty s) = — p(y, 12) [ary(my +h—1,) + bro(ng— 1g) +4’rg(ny + h—1) +b'ry(mg—19)] 
+ p(y, 72+ 1) [b(7g +1) (mg—72g— 1) +.€'(rg+ 1) (ny +h—-1;)] 
+ p(ry + 1,19) [a(r7y + 1) (my +h—1— 1) 4+6(r7, + 1) (mg—72))- (1) 


If the Laplace transform of p(r,,7,) with respect to time is ¢,(71, 72), Or ¢(71, 72) for brevity, 
we have 


G("15%2) [A+f (ry, 72)] = (72+ 1) [D(mg—72— 1) +.4'(my +A —1y)] (71, 72+ 1) 
+ (7+ 1)[a(my+h—1ry—1)+6'(me—712)]9(71+1,72), (2) 
where f(y, %2) = (ary +@'rg) (My +h—1y) + (brg + b'ry) (ny— 19). (3) 
This holds if rz < 9, ry <,. Likewise 


Q(11,%) [A+f(1, )] = a(ry +1) (my +h—-ry—1U)Q(r14+1,%) (T1< 7), (4) 
Q(y, 72) [A+f(M,7%2)] = (72+ 1) [O(mg—12—1) +a’A] g(my, 7241)  (T2< Ng), (5) 
(4, Mg) [A+f(m1, Mg)] = 1. (6) 


2. To find the mean number uninfected in the group originally uninfected, the quantities 
M(1,%2) = 729(71, 72) are required. From (2) it follows that 


M1, 72) [A+f (71, 7%2)] = (7172+ 1) M71, 72+ 1) + k(ry + 1, 72) m7, + 1,12), (7) 
with similar equations corresponding to (4), (5) and (6), where 
HM. %2) = (72-1) [b(mg— 1g) +a’ (m +h — et 
K(ry, 72) = ryla(ny +h—r,) +b'(n,—73)). 
* Now at U.K. Atomic Energy Authority (Industrial Group H.Q.) 


(8) 


Biom 44 








194 Stochastic cross-infection between two otherwise isolated groups 


To see the nature of the solution we take n,=2, n.=3 and work backwards from m/(2, 3) 
to m(0,1), giving 


m(2,3) = 3/[f(2,3)+A], m(1,3) = 3k(2, 3)/[f(2, 3) +A] [f(1, 3) +A], 
m{(0, 3) = 3k(2, 3) k(1, 3)/[f(2, 3) +A] Lf(1, 3) +A] Lf(0, 3) +A], 
m(2, 2) = 37(2, 3)/[f(2, 3) +A] [f(2, 2) +1, (9) 
m(1, 2) = 3k(2, 3)9(1, 3)/[f(2, 3) + A] [f(1, 8) + AN fC, 2) +A] 
+ 39(2, 3) k(2, 2)/[f(2, 3) +A] [f(2, 2) +A] [fC 2)+A] ete. 


Any m(r,,72) can be written from the table below: 




















by selecting all ‘approved routes’ from the cell (2, 3) to the cell (7,,7,). An ‘approved route’ 
starts from (2,3), runs along rows to the right and down columns. Corresponding to each 
approved route to (7,72) there is a term of m(r,, 7.) [f(2, 3) +A]/3 consisting of a product of 
fractions of the types 


I(81 82)/[f (81, 82— 1) + A] = A(S8;, 82), say, 


for motion to the right and 
k(8,, 82)/[f(8,— 1, 82) +A] = P(s, 52), say, 


for motion down a column, from the cell (s,, 8,). 
There are as many terms to be added to give m(r,,7,) as there are different approved 
routes from the cell (2,3) to the cell (r,,7r,). The double sum } ¥ m(r,, 72), required to find 


fi Ts 
the mean number of uninfected in the group originally uninfected, is the sum of contributions 
from all approved routes to every cell. These contributions can be arranged in sets, each set 


being associated with one approved route to the cell (0, 1). Hence 


UL M(ry, 72) = 3y(2, 3)/[f(2, 3) +A], (10) 


where y(j, k) is defined inductively by 
(j,k) = 1+0(j,k) W(j,k—-1) + G(9,k) WG-1, #) (11) 


and y/(0, 1) = ¥(1,0) = 1. Equation (10) can be expanded and the double sum consists of 
products like (9). 











f(r: 


The 
in tl 


et 


of 











H. W. Haskry 195 


When n, = 2, n, = 3, h=3 and a= 0-8, b = 0-9, a’ = 0-2, b’ = 0-2 and the value of 
f(r, 7g) at each cell is inserted within it, we have 














| r 

| ; 3 2 1 

| Ty 
2 6-6 8-2 8-0 
1 &-6 6-8 6-2 
0 3-0 3:8 2-8 




















Each term of the double sum has then to be resolved into partial fractions, becoming 
rD4,,,r,/[f(%1,%2) +A], and the different terms collected; for example, a,3/(6-6+A), 


M9/(8*2+-A), and the coefficients a,,,. computed. Corresponding to these we get inverse 
Laplace transforms a,, e~*®, etc., and their sum gives the mean susceptibles M at time t. 
Then —dM/dt is the number of ‘cases’ at time t. 

The above is a relatively simple case—though the computations are lengthy—since there 
are no two values of f(r,, 7.) alike. When there are repeated values, terms involving (k + A)-, 
(k+A)-*, (k+A)-, ete., appear in the double sum and the calculation of their coefficients is 
involved. For the numerical case cited above we obtain 


Term in mean 


number infected Term of ‘cases’ 
J (2, 3) — 9-8791le-66t — 65-202¢-66t 
f(1, 3) — 4-99234¢-56t — 27-963¢-56¢ 
F(0, 3) — 99-6923e-30t — 299-077e-3-0 
f(2, 2) 0-6787e-82¢ 5-565e-82 
f(1, 2) 3-5457e-88¢ 24-11le-®8¢ 
F(0, 2) 14-3491le-38t 54-526e-38! 
F(2, 1) — 0-8242¢e-80t — 6:593e-8% 
f(1, 1) 10-8647¢e-6-2¢ 67-36 le-*2¢ 
F(0, 1) 88-941 2e-28t 249-035e-28¢ 


The sum of the terms in the first column at t = 0 is 3, as it should be, and the sum of the terms 
in the second column for various values of t gives cases as follows: 


t 0 0-1 0-3 0-5 0-6 1-0 
Cases 1-76 2-21 2-67 2-54 2°37 1-44 


Thus we have a case curve which rises steadily to a maximum and then decays to zero as 
time elapses. 


3. I have been unable to obtain a general formula for LZm/(r,,7r2) in the general case, 
since this expression involves sums of terms like those in a hypergeometric series except that 
the factors in these terms are quadratic in r,, r, and in general without factors. 

While the mathematical model of infection used here is likely to be approximately true 
in general, there are good epidemiological grounds for thinking that it applies only to small 


13-2 





196 


groups for which n,, n, do not exceed five, and that instead of two cross-infecting groups 
there are many. Even for two groups of four the full expression for 2Xm(r,, 72) in the con- 
tracted notation of equation (10) is a yard long. Leaving aside the question of three cross- 
infecting groups—which would involve a three-dimensional group of cells instead of the 
two-dimensional group in §2—there is a special case for which a general formula can be 
obtained for the mean susceptibles in either group for any values of n,, n.. This arises when 
a, b, a’, b’ are equal. We imagine constant fractions of the infectives and the uninfected 
susceptibles in each group travelling to the other for a given infinitesimal interval. They 
then return to their own group for another different given infinitesimal interval and follow 
this by another visit to the other group and so on. This gives (1) but depends on the 
doubtful procedure of splitting infinitesimal time intervals, but if it is granted, then the 
special case corresponds to the state of affairs in which only the susceptibles, and all of 
them, travel to and fro, spending half their time in each group, or alternatively when only 
the infectives, and all of them, travel, spending equal times in each group. Choosing the 
unit of time to make a = 1, we consider this special case. 

For simplicity takingh=1, andn, = n,+1 = n,f(r,, 72) factorizes into (7, + 72) (2m —7, — 19) 
and the table for f(r,, 72) is: 


Stochastic cross-infection between two otherwise isolated groups 












































s ) = a ny eee 
ee 7 n n—1 | 2 1 
% 
n—1 (2n—1).1 (2n —2).2 (n+1).(n—1) nn 
n—2 (2n—2).2 (2n—3).3 nn (n—1).(n+1) 
Serer errr rrr rrr 
0 nn (n—1).(n+1) 2.(2n—2) 1.(2n—1) 


























This shows that there are many repeated entries but that any approved route from the 
beginning of the epidemic, (n — 1,7), to its end, (0,1), passes through each of the different 
possible values of f(r,, 7.) twice and twice only. 

We now write x(2n—2) = Q(x) = x, say, with the understanding that whenever several 
clarendon symbols are connected by signs the complete set of them forms the argument 
for Q. Thus x—y = Q(a—y), not Q(x) — Q(y). It then follows that 


f(t 72) = Ti +82, 
(2n—s—m-—1+A)O0(n—m,n—s) = (n—s—1)(s+m), 
(2n—s—m-—1+A)¢(n—m,n—s) = (n—m)(8s+m), 


L=Um(7r1, 72) = an = re : +55? 1 {y(n—1,n—1)+p(n—-2, n) . 


and by repeated use of 


and 


W(" 72) = 14+ 4(r,, 72) Wry, 12-1) + (ry 72) W(71 — 1,72), 











we f 


Seti 


he 


nt 


ral 
nt 





ee 








H. W. Haskety 197 








we find that 
n 2n—2}.1 
Balt, te) = 5 al tere 
pans ( 2.{2n — 3} 1.{2n — 2} 
«(14S 1+...4 Fy [a+ i" ..)}}. (12) 


Since x = 2n—x, all such terms as 4+ A occur twice in denominators, except n+A, which 
occurs once only. The coefficients of (r + A)—! and (r + A)? in the expansion of (12) in partial 
fractions must be found, and we can, without loss of generality, consider all values of r<n 
and then the isolated value r=n. 

All the terms containing (r+A)-? and not merely (r+A)-! are included in TJ,, where 
T, = A,B, and 
aes n(2n — 2)! (2n—r--1)! 
7 (r—1)!(1+A)(2+A)...(r—1 +A) [(r +A) (r+14+A)...(n—1+A)}*(n4+A)’ 


Bea fie CoD CoM OnmreM (yg 2H B, ,1.00—2) | 














(13) 
and in partial fractions 7), is 


Cry (L HA) +... +0,-/ (PEA) +... HOpy/(M +A) 4+,q/(1 +A)? 4+... +d, n1/(M—14+A)?, (14) 
where d,, = T(r+A)|,—» on = 2 Met MB ete. (15) 


Noting that n™ = n(n—1)...(n—7r+1), the series B, becomes 


ae P(r—1)9(r—2n)\M  (—)2 (r—1)® (r — 2n)® 
r—1+A + "@—14A)(r¥—2 +A) 





+... to r terms. (16) 
It is found that 
(A+r)? A, |, = (—)?2 2n(n—1) (2n— 2)!/[(r—1)! (2n—2r—1)! [r=1,2,...,(n—1)], 














(17) 
(= (r= 1) (r= 20) | (=)2 (r= 1) (r— 2) 
and =s BB |,» = 1+ Ti @r—2n 1) + 2 (Or —In—1)® +... to r terms 
= F(1l—r, 2n—r1r; 2n—2r+1; 1) 
= (—)r-1(2n—2r)! (r—1)!/(2n-—r—1)!. (18) 
Hence d,, = 4n(n—1)? (2n —2)!/(r—1)!(2n—r—1)!, (19) 
B, ! 
further é, -|B a fA, (A+r)*}+A,(A+r)? Rl 
S [AMA +E) (r— 1 (L4+A) (2 +A)... (F142) si 
x [((r+1+A)(r+2+4+A)...(m—1+A)]}?(m+A) 
= —n(2n—2)! 2n—r—1)!['S, 14k) + 14m) +2 = 1/a+b}. 
k=1 k=r+1 
Setting A = —r it is found that 
[ir— IP @n—2r— [5 A ied 
A--t 
2n—r—1 2n—2r—1 
= (-Yn(an—2)!{—"F te Sts yal (21) 
2n—2r+1 z=1 












































198 Stochastic cross-infection between two otherwise isolated growps 
If r=1, the first two sums of reciprocals in (21) are omitted. If thi 
Since the evaluation of 0B,/0A by differentiation of each term of B, above leads to an there 
awkward expression, the general term of B,, viz. 
(—)* (r—1)™ (r — 2n)/(r —1+A) (r-—2+A)...(r-—S+1+A)(r—x+A) (r—l2>zx2Xs), = si 
may be put into partial fractions as follows = 
& _ A-F(r—1) (r— 20) (r—n—-8) 1 93 
soi (— 81 (s—1)! (wv —8)! (27 —1-2n—8)™(r—s+2)’ (23) i 
On differentiating this with respect to A and putting A = —r, there results 
, -P“¢~-iPo-Bry-s-)) le ) wher 
4 88! (2 —8)! (2r —2n—8+ 1) (2r—2n—s8)? — 2 Hx,8), sey, (24) the i 
‘ OB, r-1l z - Al 
so that OA ‘to = 2» = g(x, 8). (25) Ups V 
r—1r—1 
This double sum may be expressed as }) 5 g(x,8), or as t= (Q@n—1 
s=l1z2=s8 
rol (—)92(r—n—8) ((—)° (r—1) (r— 2) | (—)*+4 (r— 1) (r— 2nyo bla 02% 
<=, 88! (2r—2n—s8)?| 0! (2r—2n—8— 1) 1! (2r—2n —s— 1) (Qn. 
(—)r-4 (r—1)¢-® (r — 2n)r— 
+ SE ee and 
tl Wr—n—s)(r—1)(r—2n)) = (v-r—1-# ( — 4 (r—2n—8) (r —1— 8) (- 
~ ga 88! (2r—2n—s8)?(2r—2n—s—1)\ y! (2r —2n —1— 28) 
A x 2(r —n —8) (r— 1) (r—2n) F(s+1—r, 2n—r+8; 2n+28—2r+1; 1) Gat! 
’ = ss! (2r—2n —s8)*? (2r —2n —8— 1) susc 
r—1 
= Db Ar—n—s) (r—1)! (r—2n)/(2r — 2n — 28 — 1)*-1-® (2r — 2n — 8 — 1) (2r — 2n — 8)? 8s!. s 
s=1 
r=1 
Since (2r —2n—s—1)°-) = (2r —2n —8 — 1) (27 — 2n — 28 — 1)"-1-9), 
' the above becomes 
-1 
2(r— !'S (r —n—8) (r — 2n)®/(2r — 2n —8 — 1)» (2r — 2n — 8)? 88!. 
s=1 
Further, as 
(2r —2n — 1)°-D (r — 2n)™ = (2r —2n —1)@+7-Y = (2r — 2n — 1) (2r — 2n— 8 — 1)*-2, | 
we have ; 
r—1 | 
(2r—2n—1)r-0 2B] or—11S (27 —2n—1)"-D(r—n—s)/a(2r—2n—8)a!  (r> 2). fen 
A=-t s=1 wit 
(26) 
If r=2 ~— 2(n — 1)/(2n —3)2 : 
r=4, : ( Sie (n— )/( = ys is Si 
Expressing 2(r—n—s)/s(2r—2n—s8) as 1/s+1/(p+s), 
where p = 2n — 2r, we have 
OB, rt are 
= = > {l/s+1/(p+s)}(—)-*(r—1) (r—2)... (8 + 1)/(p +8) (p+s4+1)...(ptr—1). 
OA |jat = Th 








n 


») 


H. W. Haskry 199 


If this is written out as two series, putting the terms in the order for which s = r—1,r—2,..., 
there results 





ny -| Sy. Bae... ae 2 (r —1) (r—2) : 
ee: pray (r—2)(p+r—2)* (r—3)(p+r—2)(p+r—3) to (r—1) terms 
(r—1) (r—1) (r—2) 





s 


+ ptr—1 (p+r—2) (p+r—3)?(p+r—2) het terms| 


1 
=—(p+r—1)4 | [(uPtr-2 + ar-?) {F(1—r,1; —p—r+2; —2)—L]dvt(r+1)1+(pt+r—- 13] ; 
0 
(27) 
} — where the suffix J in the hypergeometric indicates that its last term is to be omitted. If r=2 
the integral is to be omitted and if r=1, 0B,/0A=0. 


All the terms in 2X=m(r,,r,) containing an unrepeated factor (r+A)-1 are included in 
u,, where 





n(2n —2)!(r—1)! 
}" @n—1+a) (2n--2+A)...(2n—r+A) (2n—r—-1)! 


(2n—r—1)r : oe i ot BGM, + Denney.) (r<n), 

















) r+2+A r+1+A 


{ 
pit (Qn—r—2+A) 


| (2n—r—1+A) 


and the coefficient of (r + A)- in this is u,(A+r) |,__,, which simplifies to 
(—)*-1 n(2n — 2)! (2n — 2r)! F(r,r+1—2n; 2r+1—2n; 1)/[((Qn—r—1)! (rv = 1,2,..., 0). 


Gathering the terms and taking the inverse Laplace transforms, the mean number of 
susceptibles in the group originally uninfected at time ¢ is 


> (= yt man —2)! @n— 27) Fr 7 +1 — 2m; 2r+1—2n; 1)/[(2n—r—1)!} 
=1 


— 2n(n—1) (2n—2)! [-"s zt+4 ~~ 214 2 etl [(en—r— 1)!(r—1)! 
z=1 z=2n—2r+1 1 
(—)" 2n(n —r) (2n — 2)! 
+ T(r 1)!P @n—2r—1)! (Qn —r—]) 








x [cate a8) (RI =r, 1;2—p—r; —a-)—1}]dx+(r—- 11+ (2n—r—1y} 
J0 


+4n(n—r)2(2n—2)!4/(r—1)! 2n—r—1)!] exp[=r(2n—) 4], (30) 
from which the mean number of cases can be obtained. In computations it is better to work 
with (12) and use (30) as a check. 


t 
4. The method of finding the mean number of susceptibles in the group, numbering 7, 
is similar to that in § 2, for equation (1) holds and new quantities n(r,, 72) = 719(7;, 2) 
I (1172) = 72{b(Mg—12) +a'(ny +h—-1)}, 
k'(ry, 2) = (7, — 1) {a(n +h 17) +5'(ng—19)}, 
are defined, the primes indicating that they relate to the place of origin of the epidemic. 
The procedure is as before, but now r, runs from n, to 0 and r, from , to 1. 








200 Stochastic cross-infection between two otherwise isolated groups 
If n,=3, nN, = 2, h= 2. a=, b=, a’ =}, b’ =}, we have 


Table of f(r,, *a) Table of 9’(r,, *2) Table of k’(7r1, 72) 
T%> 3 2 0 T> 3 2 0 %,> 3 2 1 0 
Md ae ee kay nella oa a oe ie a 
se a oe...) 2, ae ee ‘+ t- 8s 6 @ 
Hence : 
YU 27,72) 
, = 2(2, 3)/[f(2, 3) +A] = 2[1 + (2, 3) Y(2, 2) + A(2, 3) W(1, 3)]/[F(2, 3) +A] = ete., 
giving 
" 2 1 34 3 3 
ZrzN(7r,, 72) = ial! +apcal!+ega(!+epal tan 








24 3 |+scataaalaeatan) ratte 
Toe te gy) 5+A TE+A\TE4+A_ 5+A TZ+A ata 
The coefficients of 1/(74+A), 1/(74+A)? in the above are those of e~# and te-7# in the 
expression M for the mean number of susceptibles, which is 


M = 35-2079e-% — 54-0979e—44 + 17-9143e—! + 3-75-68 
+ 27-657 Le-74! + 1-5179e-* + (6-4059¢ — 29-9493) e-73!, (31) 


yielding the following table: 


Cases=-—dMJdt 3 2:84 2-52 2-08 1:46 0-71 0-04 , 
t 0 0-18 0-30 0-42 0-60 0-90 Ga | — 


The cases show a steady decline with elapse of time. For the group to which the epidemic 
originally spreads when n, = 3, n,=2,h=1,a=3, b=3, a’ =}, b’ =}, the cases are given by a 
formula similar to (31) yielding the result: 


Cases 1 2-299 2-523 2-476 1-916 0-469 ae (33) 


t 0 0:3 0-54 0-6 0:9 1-2 3-0 J- 


It is seen that the cases in this group rise slowly to a maximum at about time t= 0-55 and 
then decline slowly to zero. 

When the cases in both groups are totalled for given times the case-time curve is similar 
to that of a self-infecting community, rising to a maximum and declining rapidly thereafter. 
The maximum total cases occur at about time = 0-24, at which time the cases for a single 
self-infecting group of the same total size is a maximum, but this self-infecting group has 
a higher maximum than the total cases in the two cross-infecting groups. The same results 
have been found for different values of a, b, a’, b’, except when a=b=a’'=b’, when the 
total cases in two cross-infecting groups is the same as for a single self-infecting group of 
the same total size. 


5. The individual probabilities p(r,,7,) can be calculated by the foregoing method. This 
is illustrated for the special case, taking n, = 3 = n, +1 and considering the group originally 

















uninfe 


q(2, 0) 


whe 














H. W. Haskey 201 
uninfected. Xq(7,,72,) now includes the quantity (1,0), since ¥ p(r,,0) is found from 
q(2, 9) +q(1, 0) +4(0, 0). It is found that P 


¥ p(r;, 0) = 1+ (29 —48t) e+ (7 — 84t) e- — 37e-, (a) 
LY (ry, 1) = (— 37 + 48t) e-* + (120¢ — 20) e-* + 57e—™, (b) 
¥ p(ry, 2) = Ge + (15 — 36t) e- — 2le—*, (c) 
YP", 3) = Qe — Qe—# + e~%, (d) 


If these probabilities are plotted against time on the same axes it is seen that (a) rises steadily 
from zero to unity while (d) decreases steadily from unity to zero. (b) and (c) rise from zero 
to maxima of about 0-3, 0-32 respectively, attained at times 0-5, 0-24 respectively, and 
then decrease to zero. 

At any time ¢ the graph which has the greatest ordinate shows the most probable number 
of susceptibles, ¢, and it is found that 


Time 0-0-32 0-32-0-37 0-37—0-45 0-45 onwards 
ie 3 2 1 0 
Approx. prob. at 0-3 0-278 0-295 


transition times 
Observational data would be obtained in this form. 

6. A convenient expression for the total mean number of susceptibles in the two groups 
can also be obtained when f(r,,7r,) factorizes. By way of illustration we now write 
A(a) = a(n, +n,.+h—2x) as x, then 

L(t 72) = (My +12) (My + Ngth—1y—12) = Ty +89. 
The Laplace transform of the total mean susceptibles in the groups is 
UXU(ry, 72) = VU(7y +72) W112); 
and writing N = n,+n.+h, 
(ry +1%2) 9" (P1572) = 72(71+7%2— 1) (N —11— 12), 


(ry +1) k" (ry, %) = 14(71+72— 1) (N —-11—72), (34) 
there results 


[f(r 72) FAL Ury, Pe) = Uy, 2+ 1)9" (11, To +1) + Ur + 1, re) h"(7, + 1, 19). (35) 
The table for f(7,, 72) is 


T?> Ng N,—1 N,—2 
"1 ny h(n, +N.) (h+1) (ny +2— 1) (h+2) (ny +7_— 2) 
4 m1 (h+1) (ny +"2—1) (h +2) (ny +n.—2) ae 


N—2 (h+2) (ny +7.—2) 


in which all entries in diagonals running from top right to bottom left are the same. The 
solution of the difference equation (35) gives, by the usual method, 


[A +f (m1, Mq)] ZIU(1, 72)/(My + Mg) = Y(My, Ne) 
= 1+ (ny, Ng) YM, My — 1) + H(My, Ne) YM, — 1, M9), 
where O(n ,%) = j"(Ny, Me)/[f(my,M2_—1) +A], (m4, Ng) = kh" (my, Me)/[F(M — 1, Me) + A). 








202 Stochastic cross-infection between two otherwise isolated growps 


Inserting the values of y(n, n.—1), y(n,—1, v2) and of their successors there results 





tte 1 ie etapa EL. 
ZLl(r1,72) = [t+ ppmreal(™ h \h+ SRG 


1 
x (ww —a— 1) (N-—h—2)h(h+ 1)+Noho3 ga" 
continued until the term 1+A appears in the denominator. If n,=n,=n and h=1, 
remembering that now x = (2n+ 1 —x), we find that 


1 (2n—1)! 1! (2n—1)!(p—1)! 
ELM re) = 2n| a+ Ga DI(A+1)(A42) + * Qm—p)!(A41)(A42)...(Atp) 








iat 1) (A+2).. Asn ? 

If the differential-difference equations for a single self-infecting community of 2n persons 
and one infectious person are written down (Bailey’s equation (5) with n replaced by 2n) 
and P(r) replaces rp(r), we have 


+. 








ao r(2n—r) P(r +1)—r(2n+1—r) P(r), 
— = —2nP(2n). 


Taking the Laplace transforms and noting 


r=7r(2n+1—r) = 2n+1-r, 








(r+A)Q(r) = r(2Qn—1r) Q(r+ ”) pots uo, (37) 
(2n +A) Q(2n) = 2n 
whence 
. 1 (2n —1)! (2n —1)! 2! 
Xr) = an| +o Nar +2)" GcHiebaepasat | (38) 


From (36) and (38) it is seen that Xl(r,,r,) is identical with 5 Q(r). Hence the solution, 


r 
obtained by the present writer for 2Q(r) in a self-infecting group, can be used for this case 
of cross-infection. Hence the total mean susceptibles in the two cross-infecting groups is 


n ! 2r) 
meet -—* : HK 2n+1—2r)— 


p ee "Det ent 1 - 2) exp[—r(2n+1—r)t]. (39) 


z=r 


7. Ifmn in (39) is Large, computations to find the number of cases at any time are lengthy, 
and even if n is small the general formula for cases, resembling (31), is not informative of 
the nature of the graphs M, —dM/dt against t. In this section it is proposed to develop an 
approximation to the total case curve for the two cross-infecting groups, considered in § 6. 
From (36) 


Q(r) = 2n! (2n—r)![(L+4r/A)(1+{r+ 1}/A)...(1+2n/A)}A/(r—1)!A24—7, (40) 
2 
and the Laplace transform of the total mean number of susceptibles is 5 Q(r) or 
0 


2n[1 + 2n/A]-1/A + 2n(2n — 1) [(1 + 2n/A) (1+{2n— 1}/A)]7/A2 
+ 2n(2n — 1) (2n—2) 2! [(1 + 2n/A) (1 + {2m — 1}/A) (1+ {2n—2}/A)}AYAF +... (41) 

















7) 


8) 


1) 











H. W. Haskry 203 


Since A is greater than any of the r, the binomials may be expanded and the coefficients of 
powers of A-! in them are the sums of homogeneous products of appropriate dimensions of 
one or more of the quantities 2n, 2n — 1, 2n—2, etc. For the first few powers of A-! we have 


2n 
¥ Q(r) = 2n[A-1 — A-* — 2(m — 1) A-3 — 4(n? — 4n + 2) A-4 + 8( — 3 + 11m? — 192 +7) A> 
0 
— 16(n4 — 26n3 + 107n? — 123n+38)+...]. (42) 


As A-” has an inverse Laplace transform with respect to time of ¢”-1/(n —1)!, the mean 
susceptibles at time ¢ is 


2n[1 —t — (nm — 1) t? — 2(n? — 4n 4 2) 3/3 + #4( — n3 + 11n?—- 19+ 7)/3 
— 2t5(n4 — 26n3 + 107n2— 123n+38)/15+...], (43) 
and mean cases are 
2n[1 + 2(n— 1) t+ 2(n? — 4n + 2) # — 48( —n3 + 11n?-19n+7)/3 
+ 2t4(n4 — 26n3 + 107n? — 123n+38)/3...]. (44) 


This is quicker than the method given by Bailey (equations (15)-(23)), but only slowly 
convergent, and a practical alternative to the expansion of the exponentials, e-, with their 
formidable coefficients. 

It has been found that the case curve cannot be approximated to satisfactorily by one 
or even two curves of the type (a + bt) e-“ of which it is composed. Since it appears to be 
nearly a truncated normal curve, the first three terms of (44) were supposed to commence 
the expansion of kexp[—a(t—})?] giving a = 4n—2, b = (n—1)/(4n—2) which can then 
be written 


2n exp {(n — 1)?/(4n — 2)} exp { — (4n — 2) [t-—(n—1)/(4n — 2) }} 
x {1 — 4#3(m — 3) (2n — 1)/3 — 2t4(2n — 1) (n? -11n + 16)/3+...}, (45) 
where the series, producing skewness, must be continued to many terms. 


For a given numerical value of n a reasonably good approximation to the case curve can 
be obtained by the method of Karl Pearson. When » = 2,3 the mean cases are from (39), 


(—23 + 36f) e- + (24 + 18t) e- 
and (1140 + 720f) e-12 + (2700t — 645) e-1 + (900t — 489) e-*, 
respectively. 

The single negative values of ¢ for which these curves cut the t-axis are — 0-3918 and 
— 0-2500 respectively. The areas under the curves and above the f-axis from these points to 
infinity are easily found, the cases at time ¢ are divided by the respective areas and finally 
the origin is shifted to the points of intersection of the curve with the ¢-axis. 

For n= 2, T = t+0-3918 and the equation of the adjusted case curve is 


y(T) = (— 146-91 + 142-59 7) e~47 + (146-91 + 156-087") eS7, 


which is like a probability density function in that | y(T)dT = 1. 
0 
The first four moments of y(7’) about 7’ =0 and the mean, 7' = 0-7984, are 


[ty = 08370, 4 = 1-0679, vy = 1-5880 
and Hy = 01996, jg = 00809, zg = 0-1597. 





204 Stochastic cross-infection between two otherwise isolated groups 


These yield By = 3/3 = 0-823, B, = w/v? = 4-01, 


which, by Biometrika Table 43, indicates the Pearson Type I curve as the best fit, although 
Type III might fit better for higher values of n. For n= 2 the cases at time ¢ are 
const. x (1 +¢/0-3918)!64(1—t/b,)"", b,=5-88. 
This gives the mode at time 0-16, rather earlier than the correct value 0-21, obtained by 
solving y’(¢) = 0. 
Using 7’ in the sense above and using a Type III curve it is found that 


(7) = e4T T7193 for n=2, T') = e SUT Tes for n=3. 
Y Y 


It is surmised in general that y(7')=e-2"" T+”, where p is positive. An improved fit could 
be obtained by calculation of exact case values, using (39), for selected times and applying 
maximum likelihood to calculate the constants. 


REFERENCES 


Battey, N. T. J. (1950). Biometrika, 37, 193. 
Haskey, H. W. (1954). Biometrika, 41, 272. 











NN woae 





igh 


uld 








[ 205 ] 


SOME STATISTICS ASSOCIATED WITH THE RANDOM 
DISORIENTATION OF CUBES 


By J. K. MACKENZIE anp M. J. THOMSON 


Division of Tribophysics, Commonwealth Scientific Industrial Research 
Organization, University of Melbourne, Australia 


ABSTRACT. A Monte Carlo method is used to estimate the frequency functions of various angles in 
a rancom aggregate of cubic crystals. Estimates are made of the frequency function for the angle of 
disorientation, i.e. the least angle of rotation required to rotate a crystal into the same orientation as 
a neighbouring crystal, and for the angles Min <100>, Min<110>, Min<112>, Min <123> and 
Min [<110>, <112>, <123>], where Min <100> is defined as the least of the nine acute angles between 
<100> directions in neighbouring crystals and similar definitions apply for the other angles. 


1. INTRODUCTION 


This paper is concerned with the estimation, by means of random sampling, of some prob- 
ability distributions which arise from a class of problems in three-dimensional geometrical 
probability. The results obtained are of interest to metallurgists in particular and perhaps 
to crystallographers in general. They are presented here in the hope that some statistician 
may be sufficiently interested to try to obtain the exact distributions in a more or less 
explicit form. 

Before stating the specific problems it is desirable in the interests of the general reader 
to describe the standard crystallographic notation for directions and planes. A particular 
direction can be specified by the components w, v, w of a vector (in this direction) relative 
to an orthonormal basis and the symbol [wvw], in square brackets, is used to denote this 
direction. Thus, [100] is the direction of the x-axis. A cube with its centre at the origin and 
edges parallel to the base vectors (axes) is invariant under the 48 symmetry operations of the 
cubic group consisting of 24 proper rotations and 24 improper rotations which are proper 
rotations together with an inversion or reflexion. Starting with a given direction [uv], 
47 other equivalent directions (24 lines in all) can be derived by the use of the symmetry 
operations and are called variants of [www]. These are simply derived from the given direc- 
tion by permuting the indices w, v, w in sign and in order in all possible ways. This set of 
48 equivalent directions is denoted by (www), in carets, and not all the 48 directions need 
be distinct, e.g. the set (100) consists of the set of 6 directions [100], [100], [010], [010], 
[001], [001], the bar over an index being used instead of a minus sign. Similarly, if h, k, 1 
are the components of a normal to « particular plane, this plane is denoted by the symbol 
(hkl), in brackets. While the set of ai: planes equivalent to the plane (hk/) is derived in the 
same way and denoted by {hkl}, :: ' races. These symbols for planes are not used in the 
present paper. 

The simplest of all the problems under consideration can be stated as follows. Given a 
single fixed reference line (defined by one of two opposite directions) and another single 
line defined by a random direction, uniformly distributed on a sphere, what is the prob- 
ability distribution of the least angle between these two lines or of the least angle of rota- 
tion required to make the random line coincide with the reference line? It is known that 
the cosine of both these angles is uniformly distributed in the range (0, 1), but what is the 
answer if instead of two single lines there are two congruent (i.e. superposable) sets of lines, 





206 Random disorientation of cubes 


the lines of each set being fixed relative to one another? The present paper gives practical 
answers to some problems of this latter type when the lines of each set are invariant under 
the rotations of the cubic group. 


2. STATEMENT OF PROBLEMS 


Consider two cubes, A and B, and imagine A to be a fixed reference cube and B to be 
initially coincident with A but free to rotate in any manner about the common centre of 
A and B. If B is rotated through an arbitrary angle about some arbitrary axis there are 24 
definite rotations which will restore B into coincidence with A. These rotations are just the 
reverse of the original rotation taken together with the 24 proper symmetry operations 
associated with a cube having indistinguishable faces. Further, each of these 24 rotations 
can be represented as a single rotation about some definite axis and through some definite 
angle. Then, of the 24 angles of rotation so defined, there is one (or more) which is least in 
magnitude and this least angle may be taken as a measure of the disorientation of the two 
cubes and will be called the angle of disorientation. 

In 1949, F. C. Frax k proposed over morning coffee the problem of determining the greatest 
possible angle of disorientation of two cubes. The answer to the analogous problem for 
squares in two dimensions is of course 45°, but in three dimensions the answer is not at all 
obvious. However, by a tedious consideration of all possibilities it can be shown that the 
maximum value of the angle of disorientation is 2 arcos }(2+,/2) = 62-80°, and that the 
rotation which achieves this maximum disorientation is most simply described as a rotation 
of 90° about any of the axes (110), i.e. axes parallel to a face diagonal. A more difficult 
problem is to determine the probability distribution of the angle of disorientation when the 
cube B takes all orientations with equal probability. An estimate of this probability distri- 
bution is made in the present paper. 

Another problem which has arisen in the course of experimental work (Ogilvie, 1952) 
can be described in simplified form as follows. In an aggregate of cubic crystals a particular 
event may occur only when one of the directions, (100) say, in one crystal is within, say, 
5° of one of the (100) directions in a neighbouring crystal. Then it may be asked what pro- 
portion of pairs of crystals in a random aggregate would comply with this last requirement. 
The problem can be formulated as follows. Imagine a set of three fixed wires passing through 
the centre and parallel to the directions (100) of cube A and a similar set of three wires 
parallel to the directions (100) of cube B. Then, if B is given a random rotation there are 
nine definite acute angles between pairs of wires (taking one from each set) and the least of 
these angles will be called Min (100). The probability distributions of Min (100), and the 
analogous Min {110}, Min (112), Min (123) and Min [(110), (112), (123)] are also estimated 
in the present paper. 

The method of calculation is described in § 3 and the results are given in § 4. The method 
of constructing 150 random orthogonal matrices and their testing for randomness is set 
out in $5. 


3. METHOD OF CALCULATION 


The calculations were performed by the method of random sampling. Since any rotation 
can be represented by a 3 x 3 orthogonal matrix (Jeffreys & Jeffreys, 1946, p. 114), 150 
random orthogonal 3 x 3 matrices were constructed as described in §4. Then with each 
matrix the following calculations were made. 





ee 





Ifa 
of the 
Thus, 
trace 
opera 
trace 
of eac 
eleme 





cal 
ler 





J. K. MACKENZIE AND M. J. THomMson 207 


Ifa given matrix represents a rotation through an angle 0 about some axis, then the trace 
of the matrix (i.e. the sum of the elements in the leading diagonal) is equal to 1+2cos@. 
Thus, to determine the angle of disorientation it is necessary, in principle, to calculate the 
trace of each of the 24 matrices found by combining the given matrix with the 24 symmetry 
operations of a cube and to choose the greatest of these 24 values. However, this greatest 
trace can be found easily from the following rule. Taking into account only the magnitude 
of each element, add to the largest element the greater of the two diagonal sums of the four 
elements not lying in the same row or column as the largest element. This rule can be derived 
by straightforward but tedious consideration of all the possibilities. 

Since the elements of the given matrix are just the cosines of the 9 angles between the 
new and the old 100) directions, Min (100) is determined by the element of largest 
magnitude in the matrix. 

The remaining calculations are all similar and only the determination of Min (123) will 
be described. First the matrix is used to calculate the new directions corresponding to each 
of the 24 (proper) variants of [123] (the remaining 24 are obtained by changing all signs). 
If a particular member of the set (123) is transformed into [w,u.us] then the cosine of the 
angle between this direction and the nearest direction of the set (123) is 

Fe(| Us| +2 | Um |+3| um), 
where w,, u,, and w, are the numerically smallest, middle and largest of u,, vw, and us. The 
greatest of the 24 cosines so calculated determines Min (123). A table showing the bounds 
for | u,|, |u| and || consistent with a series of different angular deviations from [123] 
was used to reject most of the 24 possibilities by visual inspection; the cosines were only 
computed accurately for the few remaining cases. 

Finally, when the results of the calculations had been accumulated for the 150 matrices, 
the number of cases in which the various angles lay in suitably chosen ranges were counted 
and the numbers so obtained used as estimates of the corresponding frequencies. 


4. RESULTS AND DISCUSSION 


The results are presented in the form of a series of histograms in Fig. 1. The ordinates have 
been normalized to represent probability densities when the unit of measurement along the 
abscissa is 1°, and the figures along the top of each histogram are the actual number of cases 
counted in the indicated range. If p is the estimated probability of an angle lying in a 
particular range, then an estimate of the standard error of p based on a sample of 150 is 
[p(1 —p)/150}#, and horizontal dotted lines have been drawn one standard error above and 
below the top of each rectangle of the histograms. The mean % and the standard deviation s 
of the estimated distribution are given on each histogram and are also indicated by means 
of the arrow and range at the bottom of each histogram. 

The dotted curves superposed on each histogram give an indication of the form of each 
frequency function. These have been adjusted so that the area under each curve is unity. 
Except for Min (110) each frequency function appears to have a single maximum, and 
even in the case of Min 110) the existence of a double hump is by no means certain. 
However, there are, for finer subdivisions of the ranges, indications of more than one hump 
in some of the other frequency functions and the nature of the problem suggests that the 
true frequency functions may consist of a number of continuous ares which join at sharp 
corners. 








208 Random disorientation of cubes 


As was mentioned in the introduction, a tedious argument shows that the greatest 
possible value for the angle of disorientation is 62-80°. A similar argument shows that the 
greatest possible value of Min (100) is arcos 2 = 48-19° and that the rotation which achieves 
the corresponding disorientation is most simply described as a rotation of 60° about an 
axis (111), Although no further results of this type are known to the authors it is easily 








ie) Oe ee 












































































































































04418! 41547 133744 TI 
0-04F -_ 440 = ry paige » Ad 
0-03} $= 11° - OF oof 
Xp= 628° 5 A ae be pales 
0-02 #---4 002F , 
iad 
0-01 . = 7 0-01 viz ' 
“O00 bese == L\ 0-00 q 
0:00 Jo 102 20° 30 40> 50> 60° 0° 10° 20° 30° 40° 50° 
Angle of disorientation Min <100> 
i] | | ! 
orsp43 665337 1 4 
13'34'26' 41124" 913 = %=31° 
0:08 F- a baa) 0:20 i a s=15° 
goceeny = 95440 
O06 jz st 0-15 bf \ Xm 5 je 
7 cape =f b-4----) 
004+ “aot 
/ 0-10 = 
0-02 = 0-05 + 
Pa *8 
' a 
ed or oe: 0000 9> 42> Bo 102 
Min <110> Min <123> 
Ost 31'635'39-5' 137 3 TL 
o-20+12' 31'42'33'23' 7727] og = os aa 
= S- .So 
0-15) 03 / Moors Xm= 5-2+0 5° 
0-10 02K 75 
/ 
0-05}. o1r 
005s 25 49 69 Bo 05479 1 F090 30 4° 5° 
Min <112> Min [<110 >, <112>, <123 >] 


Fig. 1. Histograms derived from a random sample of 150. The ordinate is probability density when the 
angles are measured in degrees. The figures at the top are the nu-..‘er of cases counted in each 
range while the horizontal dotted lines indicate limits corresponding to one standard error from 
the estimated total probability for each range. The mean % and the standard deviation s of the 
estimated distribution are indicated by the arrow and range et the bottom. An estimate of the 
maximum value 2,, of each variable is also given. The dotted curvs indicate the general shape of 
each frequency function. 


shown that a rotation of 45° about an axis (100) gives Min (110) = arcos }(2 + ./2) = 31-40°, 
and the results given in Fig. 1 suggest that this is probably its greatest possible value. The 
values, 2,,, of the greatest possible values of each variable are given in Fig. 1 and where the 
value is associated with limits of error it has been estimated as follows. 

A rotation is uniquely defined by three suitable independent variables (e.g. the Eulerian 
angles) so that the variation of Min <110), say, can be represented by means of a four- 
dimensional hypersurface. Now experience obtained in the calculation of the maximum 
values of the angle of disorientation and Min (100) suggests that in all cases the appropriate 








hyper 
planes 
ability 
disoris 





test 

the 
Ves 
; an 
sily 


\w 





a 





J. K. MACKENZIE AND M. J. THomson 209 


hypersurface can be approximated in the neighbourhood of x = x,, by a number of hyper- 
planes all intersecting at a common point. Thus, when z is sufficiently near x,, the prob- 
ability density would be expected to be proportional to (x,,—2)?. Except for the angle of 
disorientation and Min (100) this expected behaviour can be roughly verified from Fig. 1 
and 2,, has been estimated by plotting p? against the mean value of x for the range concerned. 
The limits of error stated in Fig. 1 are no more than reasonable guesses and the maximum 
30 + 1° given for Min (110) is to be compared with the known result that the maximum is 
probably 31-40°. 

A simple argument accounts for the initial linear rise of the frequency functions for all 
the Min (wvw) in Fig. 1. If the set (www) has 2n members defining n distinct lines Min (www) 
is the least of the n? angles 0; between pairs of lines and cos @; is uniformly distributed on 
the range (0, 1). Now the method of inclusion and exclusion shows that Pr {Min (uvw) < 6} 
lies between the limits* ¥ Pr {6;<6} and ¥ Pr{0;<6}— ¥ Pr{0;<0,6;< 6}. Thus, for 0 

i i i<j 
oetentiy ome Pr {Min (uw) < 6} ~ n? Pr (0; < 0} = n*(1—cos 6). 
The corresponding density function is n?sin@, and the estimated frequency functions in 
Fig. 1 are in substantial agreement with this prediction up to an angle of about 24/n degrees. 


5. CALCULATION OF RANDOM ORTHOGONAL MATRICES 


Since the elements in successive columns of an orthogonal 3 x 3 matrix can be regarded as 
the components of three orthogonal unit vectors, a random orthogonal matrix can be 
constructed as follows. Choose a random unit vector x and write its components as the first 
column. Choose a second random vector y’ which is independent of x. These two vectors 
define a random plane and in this plane there is a unit vector y perpendicular to x. The 
components of y form the second column while the third column consists of the components 
of x x y = z which is normal to the random plane. Thus, the problem is reduced to that of 
computing the components of a random unit vector. 
Let 21, 2, 3 be three independent unit normal deviates with joint probability density 


(277)-t exp (— 3227). 


This density is constant on the surface of the sphere £2? = constant, so that given the value 
of S = Xa? the probability that the point (x,, x,, x3) lies in any area of the surface of the sphere 
is simply proportional to that area. Thus, the direction of the vector [2,, 22, 23] is distributed 


uniformly and 
. X = [2,, %, %3]/S# 


is a random unit vector. 
Similarly, we can find another independent random unit vector 


y’ = [%1, Ye Ysl/T?, 
where 7’ = Xy?. Then, if P = x.y’, 
y = [Sy,—Px,, Syp—Pxg, Sy, — Parg]/[S(ST — Py}. 


Finally, the remaining column of the required matrix follows by computing z = xx y. 
The values of the random normal deviates were taken from the tables of Mahalanobis, 


* This remark is due to Dr H. A. David of the Department of Statistics, University of Melbourne. 


14 Biom, 44 








210 Random disorientation of cubes 


Bose, Ray & Banerji (1934) and as a check on the overall accuracy of the caiculations the 
column sums ¢,, c, and c, were formed and it was verified that 


c8+c3+c = 3. 


The standard deviation of divergences from this equality due to rounding off errors is about 
2 units in the last figure retained in the matrix elements. 

The distributions of x,/S+ and of the column sums ¢ are closely related to the t-distribution 
and in three dimensions are both distributed uniformly* (Cramer, 1946, pp. 240, 387). 
Thus, although they are not independent, all nine elements of a random orthogonal matrix 
are uniformly distributed on the range (— 1,1) while the column sums are uniformly dis- 
tributed on the range (—./3, ,/3). 

The 150 matrices were tested for deviations from these predictions by dividing the range 
of each variable into 10 equal parts and testing for uniformity of distribution by means of 
a x*-test with 9 degrees of freedom. The greatest value of y* obtained from the elements 
was 15-7 and for the column sums 22-6, while the corresponding mean values of x? were 9-4 
and 16-9. However, after permuting the columns in all the six possible ways,} these maxima 
dropped to 13-1 and 15-9 respectively while the corresponding means were 9-7 and 12-4. 
Thus there was then no significant deviation from uniformity of distribution at the 5% 
level (x? = 16-9). 

The following four rotation matrices are typical of those computed by the above method: 


0-8527 0-4846 0-1953 0-2294 — 0-6454 — 0-7286 
0-2780 — 0-7374 0-6155 0-9035 0-4196 — 0-0872 
0-4423 — 0-4705 — 0-7636 0-3620 — 0-6383 0-6794 
0-0443 — 0-:9973 0-0584 — 0-3763 0-5764 0-7254 
0-9773 0-0554 0-2045 0-7487 0-6504 — 0-1285 
— 0-2072 0-0480 0-9771 — 0-5458 0-4947 — 0-6763 


The authors wish to thank Dr E. J. Williams for suggesting the method used for cal- 
culating the random orthogonal matrices and Dr G. J. Ogilvie and Prof. F. C. Frank for 
discussion. Mr A. W. Davis checked the counting. 


REFERENCES 


CraMER, H. (1946). Methods of Mathematical Statistics. Princeton University Press. 

Jerrreys, H. & Jerrreys, B. 8S. (1946). Methods of Mathematical Physics. Cambridge University 
Press. 

Manatanosis, P. C., Boss, 8. 8., Ray, P. R. & Banzergt, 8. K. (1934). Sankhya, 1, 289. 

Octtvig, G. J. (1952). J. Inst. Met. 81, 491. 


* Thus, the cosine of the angle between a fixed direction and a random direction is unifarmly dis- 
tributed on the range (—1, 1). Hence, if § the cosine of the co-latitude is chosen at random in the 
range (— 1, 1) and the longitude ¢ at random in the range (—7, 7) 


[(1—£?)# cos ¢, (1—£*)* sin ¢, £], 


is a random unit vector; the sign of the square root is taken positively and negatively at random. This 
method of calculation of a random unit vector is suitable for high speed computers. 
+ This removes any bias due to calculating the successive columns in a definite order. 





he 


cal- 
- for 


rsity 


y dis- 
n the 


. This 








[ 211 ] 


THE DIFFERENCE BETWEEN CONSECUTIVE MEMBERS OF A 
SERIES OF RANDOM VARIABLES ARRANGED IN 
ORDER OF SIZE 


By J. H. DARWIN 


Applied Mathematics Laboratory, Department of Scientific and Industrial Research, 
Wellington, New Zealand 


1. INTRODUCTION AND SUMMARY 


Very commonly in analyses of variance a table of, say, treatment means is set out in order 
of their size and the experimenter desires to find if there is a significant break between 
successive treatment means. The theoretical problem of finding the distribution of the 
difference between the mth and (m+ 1)th in a descending series of N from a known dis- 
tribution dF'(x) readily reduces to the evaluation of an integral. For few distribution func- 
tions F(x) can this integral be computed in terms of known tabulated functions. The 
consequent numerical evaluation is often so tedious that special sorts of comparisons 
between treatment means have been suggested (e.g. see Tukey (1949), when F(x) is the 
normal distribution function with an unknown variance). 

In this note we find by the saddle-point method an approximate formula for the pro- 
bability of a greater difference than u between the mth highest and the (m+ 1)th highest 
of a sample of N drawn from dF (x). This method was first used for approximation on the 
real axis by Laplace (1820), and, in work similar to ours but on the range of the sample, by 
Cox (1948). The probability we find when F(z) is the normal integral with known variance, 
is compared with that obtained by use of the limiting distribution of the mth and (m+ 1)th 
as deduced by Gumbel (1935) for distributions dF (x) satisfying certain conditions. 

Numerical calculation for the ‘unstudentized’ normal case are compared with previous 
calculations of the same kind done by Irwin (1925) and the extension to the ‘studentized’ 
normal case is given without calculations. 


2. THE SADDLE-POINT APPROXIMATION 


Suppose we have NW sample values 2, 2, ..., Yy arranged in descending order of size; suppose 
the parent distribution of the x, isd F(x). Then the probability that x,, —x,,,, is greater than 
u is 

N! 


P,(u) = m! (N — m— 


yi] _0- Fem Ferman. (1) 
For the purpose of uying the saddle-point method on this integral we set m = Np and 
suppose WN large. This process may be expected to give reasonable results if p is near 4 and 
N is large. We shall later use the result for quite small values of N and m. The success of the 
approximation will then depend, especially when p is near 0 or 1, on the relative values of 
the first and second approximations, both of which we shall find in special cases. 
The integrand is of the form 


exp {N[plog (1— F(x +) + (1—p) log F(a)}} fla)/F(a) = exp (NG(a)) fla)/F(a), say. (2) 


14-2 








212 Difference between random variables arranged in order of size 


We suppose F(x) has continuous derivatives of the first four orders everywhere in its range. 
Any stationary value of G(x) is a solution for x of the equation 


_pfe+u) _(l—p)f(«) (3) 
1—F(x+u) F(x) ~ 


Suppose there is a unique root at 2. A useful approximation to it is 
2 = X%,+au+agu?+..., 
where F(x,) =1—-p, 
a, = —(1—p) [1+ pf" (%p)/(f(%p))"1 
and = 4 f(%p))? +f" (py) p(1 — p) (4, + 4) + f(@p) f(y) (Bat + (2—p) a, + 3(1—p)) = 0. 


The first approximation to P(w) is then 











N! (2a (1— F(@+u))™ (F(2))4-"1 f(2) 4) 
m\(N—m—I\jir ¢ f@+u) \?( f(e+u) ff ay 
(sera) +m Fern) + (Fey) Riscy oe ra 


where f’(2) and f’(#+u) = of/éx calculated at 2 and +. Suppose this is written P,(w). 
Then the next approximation to P(w) is P,(u)(1+a/N), say, where a is 
—5(G"(@)" (2) | pF) 
24(G"(2))> — 8(G"(2))?_—-2(G"(2))? f(2) 





(X3X1_y—Xi_pX3). (5) 


In this X@ and X_,, are respectively the coefficients of p and 1—p in G(x), the ith 
partial derivative of G(x) with respect to x which is set equal to 2 after differentiation. 
For a given F’, a is a function of p and u only and can be calculated if F and its derivatives 
have been tabulated. 


3. THE NORMAL INTEGRAL 


For applications the most important function F(x) is probably the normal integral 
F(z) = : * ¢-detio® dy 
~ FA(2m) J 0 


We consider first that o is one. For this distribution f(x)/F(x) is a monotonic decreasing 
function of x for all real x. This is clearly true for x >0. For x < 0 the derivative of f(x)/F(z) 
has the sign of 





_s ia (x e~t*/z) dx —e-** = xf dx<0. (6) 


Hence as f(x+u)/(l—F(a+u)) = f(-—(x+u))/F(-—(#+4)) 


—o 


it follows that f(x)/F(x)/f(x+u)/(1—F(x+)) is a monotonic decreasing function of x. It 
decreases from infinity to 0 as x increases from minus to plus infinity. Hence there is a 
unique real root of (3). 

By virtue of (6), G’(2) is less than 0 for all positive uw, and the path of the saddle-point 
method is the real axis from minus to plus infinity. Since f(x) is symmetrical about 0, 
2(p)+2(1—p) = —u, where the dependence of 2 on p is expressed by writing 4 ~~ 2(p). 
It follows easily that a(p) = a(1—p), where a(p) expresses a as a function of p. Some 











ge. 


(3) 


(4) 


(‘u). 


sing 


a) 


(6) 


. it 
is a 


pint 
t 0, 


(p). 
yme 





| | 


J. H. Darwin 213 


calculations have been made from the formula (5) for values of p from 0-01 to 0-50 and for 
values of u which span the 5 % and 1 % significance points for N up to 10 and probably for 
N up to 50 (to judge from the approximation in § 4); ma/N turns out to vary surprisingly 
little for a particular value of u and m/N varies from 1/N to 4. The calculated values of 
ma|N are given in Table 1. 


Table 1. Values of ma/N 


























| | | | 

| | | | 

phe | 0-01 | 0-02 | 0-05 0-10 0-20 0-30 | 050 | 

| | 

| 
O1 | 00805 | — | — 0-0800 jie _ 0-1159 
02 | 00775 | — | 00762 | 00758 | 0-0773 | 0-0819 | 0-1076 
05 | 00607 | — | — 0-0653 | 0-0646 | 0-0670 | 0-0863 
10 | 0-0597 | 0-0575 | 0-0547 | 0-0518 | 0-0489 | 0-0488 | 0-0607 
20 | 00450 | 0-0425 | 0-0387 | 0-0345 | 0-0297 | 00275 | 0-0321 
3-0 | 0-0362 | 0-030 | 0-0286 | 00243 | 00193 | 0-0165 | 0-0175 
| 





We could use either of the equations 
P,(u) = 0-05 (or 0-01), (7) 
or P,(u)(1+a/N) = 0-05 (or 0-01), (8) 


to give approximate significance points u. Naturally, one expects (8) to yield values of 
u closer to the true significance points. The evidence that we have for the accuracy of the 
points given by (7) and (8) is a comparison of the probabilities P(w) and P,(w)(1+a/N) 
with the probabilities for two distributions worked out by Irwin (1925). These distributions 
were of the difference between the two largest observations in samples of sizes 3 and 10. 
The comparison is given in Table 2. 


Table 2. Cempnrnn of aaicilicsceseonnonied with Irwin’s exact values 





win | O1 02 | 03 | O4 | O5 | 07 10 | 20 | 3-0 





| 











Irwin’s values | 0-917 | 0-836 0-760 0-687 | — 0-493 0-339 | 0-069 0-008 
N=3 {rw 08389 | 0-7699 | 0-7031 | 0-6389 | | 0-4649 | 0-3228 | 0-0672 | 0-0079 
P,(u) (1+a/N) | 0-9148 | 0-8348 | 0-7584 | 06857 | — | 0-4928  0-3388 | 0-0691 | 0-0081 
| Irwin’s values | 0-855 | 0-727 | 0-613 | — | 0-427 | — | ose | o-o11 | — 
'N=10 | P,(u) 0-7904 | 0-6734 | 05698 — | 0:3994| — | 0-1450/ 0-0110) — 
P,(u) (1+a/N) | 0-8536 | 0-7245 | 0-6108 | — bax 0-4255 4 01525 | 0-0113 | | — 
| | 
| 























One might expect that these distributions, corresponding respectively to a low value 
of N and a low value of p, would provide a severe test of our approximation. The success of 
P,(u) (1+a/N) over the whole range of u and of P,(w) for the higher values of u suggests that 
(8) will give values of wu sufficiently correct for practical purposes for N between 3 and 10, 
and that we will be very close to these values if we use (7). We give in Table 3 the signi- 
ficance points correct to two places for N from 3 to 10, deduced from (8). The figures deduced 
from (7) are given in brackets. 








214 Difference between random variables arranged in order of size 


The solution of (3) necessary to give the tables was made from the Biometrika Tables of 
the normal integral and frequency function, by prospecting till 2 was contained between two 
neighbouring two-decimal values of the variable. Linear inverse interpolation between these 


values, at least for 2 up to 2-5, gave a solution which appeared to be correct to 5 decimal 
places. 


Table 3. Significance points for u = Ly» —Xmi1 








3 | 1% | 2-91 (2:90) 
| 5% | 217 (2-16) 
4 1% | 2-60 (2:59) | 2-17 (217) 
5% | 1-92(1-90) | 1-58 (1-58) 
5 | 1% | 2-42 (2-41) | 1-87 (1-86) 
5% | 1-77 (1-76) | 1-34 (1-34) 
6 | 1% | 280 (229) | 1-69 (1-69) | 1-56 (1-56) 
5% | 1-67 (1-66) | 1-21 (1-20) | 1-11 (1-11) 

1 | 1% | 221 (2-20) | 1-58 (1-57) | 1-39 (1-39) 
5% | 1-60(1-59) | 1-12 (1-11) | 0-98 (0-98) 

8 | 1% | 215 (213) | 1-49 (1-49) | 1-28 (1-28) | 1-23 (1-22) 
5% | 1-55 (1-53) | 1-06 (1-05) | 0-90 (0-89) | 0-86 (0-85) 
9 | 1% | 2-09 (2-08) | 1-43 (1-42) | 1-20 (1-20) | 1-12 (1-11) 
5% | 1:50(1-49) | 1-01 (1-00) | 0-84 (0-83) | 0-78 (0-77) 

10 | 1% | 2-04 (2-03) | 1-38 (1-37) | 1-14 (1-14) | 1-04 (1-04) | 1-01 (1-01) 
5% | 1:46 (1-45) | 0-97 (0-96) | 0-79 (0-79) | 0-72 (0-72) | 0-70 (0-70) 
































In making Table 3 we found two neighbouring two-decimal values of u for which P,(w) 
lay just above and just below 0-05 or 0-01. From these we could get the figure in brackets 
correct to at least 3 decimal places. Where necessary for the rounding off to 2 places, P,(u) 
for values of wu between these was computed. 

Interpolation was made in a graph drawn from the data of Table 1 to give 1+a/N. 
Upper and lower estimates were made of the true value, these being taken absurdly far 
apart to ensure that the true value lay between them. Then it was usually possible to find 
an unequivocal solution te (8) correct to two decimal places. Any doubtful cases were 


resolved by the computation of a/N which always lay nearly half-way between the estimates 
obtained from the graph. 


3:1. The studentized test 


In the application of Table 3 to testing the differences between successive means in an 
analysis of variance we should use means standardized by division by the known standard 
deviation of the means. Sometimes previous experience with the material of the experi- 
ment will provide a reliable estimate of the standard deviation. More usually, however, 
the standard deviation will be estimated from some error variance having v degrees of 
freedom. For finite but large v, one may again use the saddle-point method to provide a 
series in 1/v giving the probability that ,/n (z,,—2,,,,)/s is greater than u. In this case z,, 
is the mth highest of N means, each of n observations, and s? is an estimate of the error 


ial 





J. H. Darwin 215 


variance, with vs? having a x? distribution with v degrees of freedom. Alternatively, one 
may read this series from Hartley (1944). 
We have 


Prob (yn (Xm a ®m+1) > us) 


= P(u)+ J y2P”—uP’)+ 4 v piv - pr . P+ — Pl + 9 
* 4p ‘ y2|32 48 32 32 ee id (9) 


where P® = (d/du)* P(u) and P(u) is as in (1). The derivatives of P(w) may be obtained from 
(1) by differentiation under the integral sign. Each ratio P/P may then be approximated 
by the method of § 2 by a power series in f(% +)/(1— F(@+u)), where 2 is again the solution 
of (3) and P(u) is approximated by P,(w). No calculations have been made for the studentized 
case, so that the accuracy of this last suggested approximation is unknown. 

When there is not enough previous experience for o to be well enough known to be 
assumed constant a conservative procedure will be to use, not s, but an upper confidence 
limit for o. Thus if we take, say, o, as a 95 °% upper confidence limit for o 19 times out of 20 
when we make the statement—the probability of getting a value of uw larger than 
{2 (%p_, — ®m41)/0,, is less than (1)—we shall be right. However, this use of Table 3 will not 
give much discrimination between means for low y, and its chief advantage over the use 
of significant differences obtained from the ¢ distribution is that it does produce the correct 
pattern of the strong dependence of the test values on m. 


4. GUMBEL’S ASYMPTOTIC THEORY 


An interesting comparison is available with the asymptotic theory of the distribution of 
the mth value developed by Gumbel (1935). He discusses distributions of what he calls 
exponential type. For F(x) to be of this type we must have approximately for large x 





__f@)_ _ f'@) 
eae al yr nig 
x v 
and f’(x)=(-) —F(@)y" 


(10) is satisfied by 1— F(x) = expa(x—b) (a< 0, x>6). It is not to be expected that it will 
be a good approximation for the whole range of « for a distribution as different from the 
exponential as is the normal. However, for our problem the whole range of x is not so 
important in the evaluation of (1) as is the range around the expected values of the mth 
and (m+1)th highest values. Gumbel sets F(u,,) = 1—m/N and expands F(x) about u,, 
in a Taylor’s series, getting, by virtue of (10), 


m 

F(x)=1—exp[—(%#— Um) f(Um) N/m]. (11) 
Suppose this is a good enough approximation to F(x) for it to be used in the integration 
leading to (1). Then the probability that x,, —x,,,, is greater than u is approximately 


exp [—Nuf(u,,)}. (12) 


We give in Table 4 the percentage points calculated from (12) for the normal distribution 
and the same cases as those of Table 3. 





216 Difference between random variables arranged in order of size 


It appears that for the normal distribution (12) is conservative in claiming significance, Mor 
but is not accurate enough for ordinary use. If F(x) = 1—exp (—«az), (12) is exact.' pria 





Table 4. Significance points for x,,—X », given by Gumbel’s theory 












































whe 
m 
= 1 2 3 + 5 by « 
3 1% 4-22 
5% 2-75 
4 1% 3-62 2-89 and 
5%, 2-36 1-88 
5 1% 3-29 2-38 whe 
5% 2-14 1-55 
6 1% 307 | 11 1-92 
5% 2-00 1-37 1-25 
7 1% 2-92 1-94 1-68 and 
5%, 1-90 1-26 1-09 of t 
8 1% 2-80 1-81 1-52 1-44 
5%, 1-82 1-18 0-99 0-94 
9 1% 2-70 1-72 1-41 1-30 A 
| 5% 1-76 1-12 0-92 0-84 is t] 
10 1% 2-62 164 | 1-32 1-19 1-15 
| 5% 1-71 1-07 0-86 0-78 0-75 
anc 
5. OTHER DISTRIBUTION FUNCTIONS 
ae GS ae ; ; , Pe , Th 
The exponential distribution provides a comparison of the theoretical probability with the oot 
known probability (12), which is exp (— amu). The approximation by P,(w) is the 
N . 
| (20) p™+4(1 — p)X-™+4 exp (— amu). (13) 
- A 
s 
This coincides with the correct value if the first term of Stirling’s approximation to the on 
gamma function is used for the three terms of the binomial coefficient. The second approxi- ul 
mation is (13) multiplied by 
a(p) p—p+l 
l =1l+—_—____,, 14 
ty = ** ionp(i—p) “5 whi 
balancing the second term in Stirling’s approximation. 
Again a(p) = a(1—>p), so that the correction is still symmetrical about p = 4, although 
the main term exp (— amu) is not symmetrical about m = 4N. a(p)/N isstill small, achieving I 
its highest value 4; for p=1/N and N large. (A similar analysis can be done when Py 
F(x) = exp(—exp(-—azx)).) A test that is independent of the value of @ is possible if an Thi 
independent estimate of 1/a is available from, say, the average of r independent variables nea 
drawn from the distribution. The sum of r such values, X, say, when multiplied by « has Thi 
the distribution (1/I'(r))exp (—2)x’-1dz and the probability that (2,,—2,,,,)/X is greater 
than wu is | 


(1+mu)-. (15) 








1Cce, 


the 








J. H. Darwin 217 


More usuaily no such independent sample will be available, and it is perhaps more appro- 
priate then to test the ratio x,,/x,,,, for « = 1. Instead of equation (3), we have the equation 
pof(cx) _ (1—p) f(x) - 
1—F(ca) F(x)’ (16) 
when we are trying to find the probability that x,, is greater than cz,,,,. This is satisfied 
by exp(—2) = pe/(pc+1—~p). The first approximation to the probability is 





(7, ) (2m) (peyone4 (1 — py (pe +1 — p)-Nm-om-, (17) 
(1—p)?+ pe(pe—1+p) 

12Ncp(1—p)(pe+1—p)’ 

where a(p,c) = ca(1—>p, 1/c). The biggest value of a/N is again ;/,. The true value for P(u) is 


IC me") ag 


and the correction factor balances the second term in Stirling’s approximation to the terms 
of the denominator of (19). 
5-2. The rectangular distribution 


Another distribution that provides a check on the accuracy of P,(u) and P,(w)(1+a/N) 
is the rectangular distribution, F(x) = x in (0,1). Then P,(u) is 


and the correction factor is 1+ 





(18) 


(7, ) Vea) peta —py-me 1 — a (20) 


“ee 2 
and as in (14) ap) _ pt+p 


N  12Np(1—p)' 


The true value of P(w) is (1—w)%. Thus for both exponential and rectangular distributions 
and for the range of wu investigated for the normal distribution the correction factor is less 
than +5. 

6. RENORMALIZATION 


As suggested by the referee, we can compare the above results with those obtained by 
renormalizing P,(w) (see Cox, 1948; Daniels, 1956). We renormalize P,(u) by dividing by 
P,(0). When u = 0, F(2) = 1—>p, and, as u tends to 0, a(u)/N tends to 

(1—p+p*)/{12Np(1 — p)} 
which agrees with 1/P,(0) to order 1/N. 


P,(0) = (7, Vm) pm. — ppm, (21) 


In general, a(u)/N seems to decrease as wu increases. Hence use of P,(w)/P,(0) instead of 
P,(u)(1+a/N) will tend to overestimate the probability of a bigger difference than w. 
This difference between the two, however, is likely to be numerically more serious for P,(w) 
near 4 than for P,(u) near 0, since when P,(w) tends to zero the difference also tends to zero. 
This is illustrated by the values of P,(u)/P,(0) for the same values of w as in Table 2: 


N=83: 0-9223 0:8464 0-7730 0-7024 0-5111 03549 0-0739 0-0087 
N=10: 0°8579 0-7309 0-6185 0-4335 0-1574 0-0119 








218 Difference between random variables arranged in order of size 


The pattern of the difference for m + 1 can be gauged from the sizes of the factors 1 + a/N 
and 1/P,(0) for N = 20. These are: 


1+a/N from u = 0-2 to u = 3-0 decreases 1/P, 
for m = 1 from 1-0762 to 1-0286 1-0847 
m = 2 from 1-0379 to 1-0122 1-0427 
m = 4 from 1-0193 to 1-0048 1-0221 
m = 6 from 1-0137 to 1:0028 1-0158 
m = 10 from 1-0108 to 1-0018 1-0126 


The biggest differences between the factors occur for low values of m and high values 
of wu. For higher wu, near the significance points, there is little to choose between P,(w), 
P,(u)(1+a/N) and P,(u)/P,(0), since the difference between any pair is, for example, of 
order (0-05) (1 +345) for the 5 % point; but for low values of u the second two are appreciably 
more accurate than the first. 


My thanks are due to Miss Mary Chung for help with the calculations. 


REFERENCES 


Cox, D. R. (1948). A note on the asymptotic distribution of range. Biometrika, 35, 310-15. 

Dantets, H. E. (1956). The approximate distribution of serial correlation coefficients. Biometrika, 
43, 169-85. 

GuMBEL, E. J. (1935). Les valeurs extrémes des distributions statistiques. Ann. Inst. Henri Poincaré, 
5, 115-58. 

Hartiey, H. O. (1943). Studentization or the elimination of the standard deviation of the parent 
population from the random sample-distribution of statistics. Biometrika, 33, 173-80. 

Irwin, J. O. (1925). On a criterion for the rejection of outlying observations. Biometrika, 17, 238-50. 

Lapuacg, P. S. (1820). Théorie Analytique des Probabilités, 1, part 2, chap. 1. 

Tuxey, J. W. (1949). Comparing individual means in the analysis of variance. Biometrics, 5, 99-114. 








REL 


where 
distri 


sugge 
and f 
indices 
ment 


2. 
abilit 


By in 


we fi 


Differ 


wher 


Henc 








[ 219 ] 


RELATION BETWEEN THE DISTRIBUTIONS OF NON-CENTRAL ¢ 
AND OF A TRANSFORMED CORRELATION COEFFICIENT 


By B. I. HARLEY 
University College, London 


1. Ifr is the correlation coefficiert in a sample of size n drawn randomly from a normal 
bivariate population with zero correlation, then the quantity (n—2)!r/(1—r)? is known 
to be distributed exactly as Student’s ¢ with (m — 2) degrees of freedom. 

By a comparison of moments we shall show that if the population correlation, p, is not 
zero, then 


tae ai — 2)Fg{p}, 

where g{p}, an appropriately chosen function of p, is distributed approximately as non- 
central ¢. We define the distribution of non-central ¢ having f degrees of freedom as that of 

Po. +6 
wt 

where z is a unit normal variable having zero expectation, w is an independent variable 
distributed as x?/f with f degrees of freedom and é is the non-central parameter. We shall 
suggest a form for g{p} and determine appropriate relations between n and p on the one hand 
and f and d on the other. Finally, some numerical comparisons will be given and a method 
indicated whereby the approximate equivalence of the distributions may be used to supple- 
ment the tables of the non-central t distribution given by Johnson & Welch (1939). 


2. Our first objective will be to obtain the moments of r/(1—1r?)* when p+ 0. The prob- 
ability distribution of r is known to be 





a i--P* 2) 3(n—4) e* { (—1p | l 
P(r| np) m(n—3)! = d(rp)"* | (1—r2p2)t J” ” 
d"-* (cos? { Bel = 
By integrating over the range of values of r and denoting rp a— 02) F by f”—(rp) 
we find that 
+1 a i 
- (1 —12)Rn—® fr-2(rp) dr = a C(n,p). 
Differentiating with respect to p gives 
+1 0 PR is 1)m(n—3)!p 
— r2\h(n—4) » sf fn—2 and 2 1 
[i reer hep de = ay = O'lmPh 
where C™(n, p) = sw Om p) for m=1,2,. 
+1 
Hence | ——— _(1—r®)k-% fn-1(rp) dr = C1(n, p). (2) 
~1 (l—r?)t 








220 Transformed correlation coefficient 


If we denote the expectation in a sample of size k by &,, where k can take integral values, then 


bua(o— a a) [ can ; Gane frp) dr 


C*(n, p) 


~ O(n+1,p)’ 


If (n + 1) is replaced by n, then 


r C1(n—1,p) 
é. : i) - C(n,p) ~ 8) 


To obtain the second moment we consider 








a +1 
C%(n —2, p) = spl 2P) -| ee 


1 6p (1—r2)h Rae ee 





+1 2 
eas -[" a a—) (1 —/2 yin) fn—-2(r) dr 


2 
Pwned 





Hence é 


n 


a) 5aa 


In general we have 





r m C™n—™m, p) 
daca he : 
(1—r?)3 C(n, p) ©) 
since differentiating with respect to p results in effect to multiplying by r, and increasing | 
the power of the differential by one. 
From equation (5) the first four moments about the origin of 


v = (n—2)8 —— 9p} 











(1—r*) 
are found to be 
ivan 2 
mile) = 60) = Gay pa PP (6) 
(n—2)(, , (n—1)p? : 
pi(o) = 6,002) = PB (14) oP. (7) 


— 2y8 
nfo) = &,(v8) = ——*) P| 


(n — 3) (n — 5) (1 —p2)3 


— 6(n — 1) p? ——- . ; 
(n—4)(n—6 5 (3+ ae a f ) (ote (9) 


From these equations we obtained the moments about the mean and the beta-ratios, but 
since they are not necessary for the subsequent work, they are not reproduced here owing 
to their length. 


2 
am) [9{ey}, (8) 


Hy(v) = €,(v*) = 





and : 


Equi 


Nex 


en 


ing 














B. I. HARLEY 221 


3. Johnson & Welch gave the first three moments of the non-central ¢ distribution, and 
since w and z are independent it is a straightforward matter to find the fourth moment and 
the beta-ratios. 

We find that, for the moments about ¢ = 0, 





sTyif— 
n=] ce 6, (10) 
H(t) - 50 +6?), (11) 
3 ae 
pst =F? a3 +38, (12) 


and from these equations we obtained the moments about the mean and the beta-ratios. 


4. To bring the distributions of r/(1—r?) and of non-central ¢ into correspondence we 
have to specify a form for the function g{p} and to relate n and p to f and 6. It is clear that 
the link could be made in a number of ways. After various methods had been examined that 
described below was chosen, partly because it led to simple relations which did not involve 
any approximation for the Gamma function and also because it appeared to provide satis- 
factory results in the cases where numerical comparisons were made. 

From equations (10) and (12) we have 

















b(t) _ _f 2 
wi) F-3)P ™ 
and from equations (6) and (8) it follows that 
H3(v) " 2 (n — 2) np 
na) = OP ma) 8 pa o 
Equating these moment ratios gives 
| a. ee ' 
wr Ga (taps) = Gow Ot wai 
Next equating the values of 3(v) and y(t) given in equations (7) and (11) we have 
s(n—2)/,  (n—1)p)_ sf ; 
WOR a(t Gapy) Gow ot 


If we take f = n— 2, a result which gives exact correspondence when p = 0, then (16) and 
(17) are satisfied when 


_ ((2n—3)p\t _ (2(1—p%)\ 
= (“aap ) ome oteh= (i) i 


Apart from the second moments about zero, the moments of the two corresponding 


distributions will not agree exactly, but some results given in Table 1 suggest that the 
representation of one distribution by the other may nevertheless be quite good. 











222 Transformed correlation coefficient 


Table 1. Comparison of the values of the first two moments and ,/f, and f, of v and t 





ei* | é mile) | wilt) Helv) | palt) | VA, @) | Vb (t) | Balv) Alt) 
sak | 
| 


0-2 | 15 | 0-742  0-7891  0-7889 | 1-2103 | 1-2107 | 0-2150 | 0-2139 | 3-7386 | 3°7352 
0-2 | 27 | 1-020 | 1-0522 | 1-0521 1-1111 1-1113 | 0-1361 , 0-1356 | 3-3121 | 3-3115 











| | 
| 06 | 15 | 2-435 | 2-588 2-587 1-489 1-493 0-631 | 0-615 4-260 | 4-236 
| 06 27 | 3346 3451 | 3-450 1-346 1-353 0-404 | 0-369 3-556 3-627 




















5. Johnson & Welch considered the case of calculating a value t, such that P{t>t,} = e, 
where 6 and f are assumed known, and they gave tables of a function A(f,t,,€) for various 
probability levels e«. By an inverse use of these tables and an iterative process, a value of 
i, can be calculated when f and 6 are known, for the various probability levels e given. If 
t, is required for other probability leveis an estimate of t, could be made using the normal 
equivalent deviate as an estimate of A, but this does not give a very good approximation 
in most cases. 

The probability integral of r in samples of size n from a normal bivariate population has 
been calculated by F. N. David (1938) for values of » = 3(1) 25, 50,100, 200,400 and 
p = 0-0(0-1) 0-9, and thus by using the approximate connexion between r and ¢ given in 
§ 4 we can obtain values of t,, given 6 and f for any probability level and not only those given 
in Johnson & Welch’s tables. 

The method of calculation of ¢,, given 6, f and e, is as follows: 


(i) Using the relationship between 6 and p given in equation (18) we have that 


262 \t 
te (5—srH) : 
where n = f+ 2, and where from a consideration of the values of }(v) and j;(¢) we note that 
p and 6 have the same sign. 
(ii) Using this value of p, and taking » = f+ 2, a value r, can be found from David’s 
tables such that P{r>r,} = e. 
(iii) Finally, t, is calculated from the relationship 


- 2f(1—p*)\3 Ye 
i= ( 2—p? ) (1-12) 





To illustrate the accuracy of the approximation for given values of 6, f and ¢, values of 
t, were found by means of the r approximation and compared with the exact values found 
by using Johnson & Welch’s Table IV. For comparative purposes the values of t, using 
Johnson & Welch’s first approximation, in which the equivalent normal deviate was taken 
as an estimate of A, were also calculated. To simplify the entry into David’s tables 6 was 
chosen in such a way that 10p was integral, but, of course, this is not a necessary condition. 
From the nature of Johnson & Welch’s Table IV it would seem reasonable to expect in 
most cases an accurate estimate of t, to three significant figures only. From the results given 
in Table 2, where the number of figures given is determined by the number given for A by 
Johnson & Welch, it is seen that the values of t, calculated by using Johnson & Welch's 























hat 


id’s 


h’s 







































































B. I. HaRLey 223 
Table 2. Comparison of the values of t, calculated by three 
different methods at various probability levels 
Values of ¢, 
| 
Eg 6 p € From 
Exact From r Johnson & 
approximation | Welch’s first 
approximation 
— SS ee, ee eee |—_—__ —— 
7 0-553 0-2 0-4 0-843 0-843 0-813 
0-2 1-512 1-513 1-457 
0-05 2-621 2-623 2-542 
0-01 3-90 | 3-90 3-93 
% OBCELS eesti BM, OS EC 
1-815 0-6 0-4 2-190 | 2-188 2-105 
0-2 2-992 | 2-992 2-876 
| 0-05 | 4-434 4-436 4-331 
| O01 | 6-20 6-20 6-45 
eee eee: oe saapuveas <==" 4 
| 2657 | 08 0-4 | 8-105 3-100 2-981 
| | 0-2 4-030 4-026 3-867 
| 0-05 5-739 5-741 5-628 
| 0-01 7-87 7-89 8-34 
| | 
| | o_o — — a a a 
| 
3-195 0-9 0-4 3-699 3-688 3-544 
0-2 | 4-708 4-701 4-512 
0-05 6-603 | 6-608 6-488 
| 0-01 | 8-99 | 901 | 9-60 
(caesar | ee oeerieey. (Ser ia ee 
| = | 
16 = = 0-821 0-2 0-4 1-096 1-096 | 1-07 
| 0-2 1-727 1-727 | 1-699 
| 005 | 2674 2-674 2-635 
| 0-01 | 3-60 3-60 | 3-57 
2-691 0-6 0-4 | 3-029 3-027 2-978 
| 0-2 | 3-760 3-758 3-697 
0-05 | 4-920 4-922 4-860 
0-01 6-12 6-12 | 6-12 
see —— + eis s eel eae 
3-941 0-8 0-4 4-334 4-330 4-258 
02 | 6159 5156 =| 5-071 
0-05 6-502 | 6-509 6-431 
0-01 7-91 7:94 | 7-95 
See ae FS a eC SP a 
| | 
4-739 0-9 0-4 5-173 5165 5-080 
0-2 | 6-068 | 6-063 5-962 
| 0-05 7-538 | 7-546 7-462 
| 0-01 9-10 9-12 9-17 
| | 











224 Transformed correlation coefficient 


Table IV agree with those calculated using the r approximation, to this accuracy at least, 
in all cases where p = 0-2 and 0-6. For p = 0-8 and 0-9 the agreement is not quite as good, 
but for ¢ = 0-4, 0-2 and 0-05 there is still only a difference of at most one unit in the third 
significant figure between the exact vaiue and that from the r approximation. Since Johnson 
& Welch give fewer decimal places for e = 0-01, one would expect that for large values of 
6 the third significant figure may be in error by one or two units which may account for the 
seemingly larger discrepancy between the first two methods. 

In most cases the method using the probability integral appears to be more accurate 
than Johnson & Welch’s first approximation given in the last column of Table 2. Certainly 
these results suggest that the probability integral of r could be used to obtain reasonably 
accurate values of t, for those values of ¢ not given in Johnson & Welch’s Table IV. 


REFERENCES 


Davi, F. N. (1938). Tables of the Ordinates and Probability Integral of the Distribution of the Correlation 
Coefficient in Small Samples. Cambridge University Press. 
Jounson, N. L. & Wetton, B. L. (1939). Biometrika, 31, 362. 





To ca 
normé 
it is 1 
equat: 
cedur 
with 1 
doubl 
with « 
both 
which 
simila 
estim: 
stand. 
trune 
(1953 
simpl 
by re 


2. 


We | 
funct 


For a 
whick 
M+u 


wher 


‘ion 





[ 225 ] 


ON THE SOLUTION OF ESTIMATING EQUATIONS FOR 
TRUNCATED AND CENSORED SAMPLES FROM 
NORMAL POPULATIONS* 


By A. CLIFFORD COHEN, Jr. 
The University of Georgia 


1. INTRODUCTION AND SUMMARY 


To calculate maximum-likelihood estimates of the mean and standard deviation of a 
normally distributed population from doubly truncated or from doubly censored samples, 
it is necessary to solve simultaneously a pair of rather complex non-linear estimating 
equations. Although solutions can be approximated by straightforward iterative pro- 
cedures, the calculations often become tedious and time-consuming. This paper is concerned 
with reducing computational labour required in obtaining these solutions. For use in the 
doubly truncated case, a chart which permits direct reading of the standardized terminals 
with an accuracy of from three to five units in the second decimal is included. For use in 
both the doubly truncated and doubly censored cases, an iterative procedure is devised, 
which, for a specified degree of accuracy, appears to require less computational effort than 
similar procedures previously proposed. For use in singly censored cases, where only one 
estimating equation is involved, a chart which permits direct reading of estimates of the 
standardized terminal with a degree of accuracy comparable to that possible in the doubly 
truncated case is presented. Since tables recently published by Cohen & Woodward 
(1953) reduce the calculation of estimates from singly truncated normal samples to the 


simple task of interpolating between table entries, that case is considered here only 
by reference. 


2. GRAPHIC SOLUTION OF ESTIMATING EQUATIONS FOR DOUBLY TRUNCATED SAMPLES 


We let x designate a normally distributed random variable with probability density 
function 
f(x) = (7 J (27) exp [— («—p4)?/(20?)] (—20< <0). (1) 


For a doubly truncated sample consisting of n observations from this population, each of 
which is subject to the restriction z)<x<a)+w, where the sample terminals, x, and 
% +, are fixed, the logarithm of the likelihood function is 


L = —nIn[I(é,)—1(&)]—nIno— ¥ (a; —1)2/(20%) + const., (2) 
i=1 
where 16) =|" giyat, g(t) = (W2mytexp— 38 (3) 
gE 
and £1 = (%—pM)/o, 2 = (%+w—p)/o. (4) 


* Sponsored by the Office of Ordnance Research, U.S. Army. 


15 Biom. 44 








226 Truncated and censored samples from normal populations 


Maximum-likelihood estimating equations obtained by equating to zero, the partial 
derivatives of L with respect to ~ and o respectively were given by Cohen (1950) as 


o[Z,—-Z,—§]—") = 3 
o*(1—£,(Z,-Z,—£,)—Z,w/o]—v, = 0, 
where Z, = $(E,)/{1(E1) -1(E2)], Ze = $(E2)/[1(Es) — 1(Es)], (6) 


(5) 


5 
%= 2% D (x;,—X)*/n (7) 
No information is assumed to be available about observations which might have been 
eliminated as a consequence of the restrictions on observation of x. 
A procedure based on two-way interpolation was employed by Cohen (1950) to solve 
equations (5) simultaneously for maximum-likelihood estimates, ¢ and &,. With these 
values determined, / follows from (4) as 


ji = x —68,. (8) 


Throughout this paper, the symbol (*) serves to distinguish maximum-likelihood estimates 
from parameters estimated. 
Expressing o as a = w/(€.—£)), (9) 


a result which follows from (4), further reduction of estimating equations (5) yields 


is ‘ Par aera (10) 


[1+£,2,-&Z, —Z,)"|/(&— £1)? — 8*/w? = 0, 
where s? is the truncated sample variance (= v,— 13). The two equations of (10) are thus of 
the form F,(é,,&) = H,(&,,.)-K; = 0 (i = 1,2), where for a given sample the K; are 
constants. 

The above form for the second equation of (10) was suggested by Mr George W. Thomson, 
who, together with Friedman and Garelis (1954), tabulated H,(&,,&,) and H,{&,,&) at 
intervals of 0-5 for the two arguments. As a means of circumventing interpolation difficulties 
imposed by the large tabular intervals and in order to further facilitate the solution of (10), 
the two families of curves, F,(,,£) = 0 and F,(£,,&) = 0, were plotted for selected values 
of v,/w and s/w, and are presented here as Fig. 1.* Co-ordinates for points on these curves 
were read from large-scale graphs of H,(,, £,) carefully plotted from the Thomson, Fried- 
man & Gerelis (1954) tables as functions of £, for the family of £ values. 

The fact that H, and H, obey the relations 


H,(E,, £9) = [Z, —Z,—£,]/(2—&1) = 1-H,(—£2, — £1), } 

A,(£), £2) = [1+ £,Z,— EZ. ~y<« Z2)")/(Es —£,)? = H,(—£2, — £1); 
permitted one-half the points required in plotting Fig. 1 to be obtained by reflexion. From 
the first of the above relations, it follows that the graph of H,(£,,£)— _K = 0 is the reflexion 
of the graph of H,(é,,£,)—(1—K) = 0 about the line £, +£, = 0, while from the second it 
follows that H,(&,, &,) is neal about this line. 


(11) 


* The assistance of Miss Lola Kiser and Mr Robert Lathrop in the preparation of this chart is 
gratefully acknowledged. Copies of the original 8-figure tables are available and can be obtained on 
request from Mr G. W. Thomson, Ethyl Corporation, 1600 West Eight Mile Road, Detroit 20, 
Michigan, U.S.A. 





45 


~ Y y 7 w > 
o o w o w °o 
oe ey ee ee a ee ey ey ee ee Se ey ee De ee ye |, ee ee > a ee ee 


on 
n 


= 
uw 


° 
o wn 


' 
> 
w 





tial 


ison, 
») at 
ities 
(10), 
alues 
Irves 
ried- 


(11) 


From 
>xion 
nd it 


art is 
ed on 
it 20, 





A. CLIFFORD CoHEN, JR. 227 


With v,/w and s?/w? computed for a given sample, the intersection of corresponding curves 
of F,(&1, £2) = 0 and F,(£,,&) = 0 is located on the chart of Fig. 1. Co-ordinates of this 
point, which can be read to within three to five units in the second decimal, are the required 


45 








| 








40 











3-5 




















3-0 




















25 


NSS 
AWN 
\ 























—— 

















AMAIA 











\ 
















































































0-5 yr \ 4 
L VN N 4 

* \ a 

. 2 TN 7 
SIN / 3 
-05 t . 
imi Wh f 
L 7 

: l| U, 1 / : 
-1:0 it jAi ft | 1 | | 1 AZ) SSA eee eee 
-45 —40 —35 -3-0 —2:5 —2-0 -—1:5 — 1:0 —0-5 0 0-5 1:0 


Fig. 1. Estimation curves for doubly truncated samples. (1) Locate curve corresponding to sample 
value of v,/w. (2) Follow curve thus located to its intersection with curve which corresponds to 
sample value of s?/w*®. (3) Co-ordinates of intersection determined above, which may be read on 
scales along base and left edge of chart, are the required values of £, and £5. 


values of 2, and é,. This degree of accuracy is adequate for many practical applications, but 
when more precise results are necessary, values thus read serve as first approximations 
to be improved through iteration. 

With @, and é, determined, estimates of the mean and standard deviation follow from 


(4 
a tik=ty, f-n Oh, (12) 


15-2 








228 Truncated and censored samples from normal populations 


Although estimating equations (10) were derived by the method of maximum likelihood, 
these same equations also result from equating the first and second truncated sample 
moments to corresponding population moments. In standardized population (complete 
distribution) units, the mean of the population central block (truncated distribution) from 
the population mean may be written as 


1 2 aa = 
A 8 eg = —_ se 
Se py}, OOP = 


where @,, designates the kth standardized moment of the truncated distribution from the 
population mean in population standard units. The mean of the truncated distribution 
about the left terminus is 


The foregoing may be made clearer by reference to Fig. 2. 


eS eens 














gy 0 Mean of population f 
central block 





Fig. 2 


The first equation of (10) is then the result obtained by equating the ratio of (i) distance 
of sample mean from , to (ii) distance between truncation points, w, to the corresponding 
population ratio, (Z,—Z,—£,)/(&—&,), otherwise denoted as H,(&,, &). 

The second moment of the central block in a standardized normal population about the 
mean of the whole population is 


ee 1 o (2 R fs 
= 16) TG) |, SHH = 1+ Flr 


The variance, G, of the central block of the population, is therefore 
o* = &,—@ = 1+£,Z,-£,Z,—(2,—Z,)*. 


The second equation of (10) then results from equating the ratio of (i) sample variance to 
(ii) the square of the distance between truncation points, w?, to the corresponding popula- 
tion ratio [1+ £,Z, —£Z,—(Z, —Z,)"]/(&,—§&;,)?, otherwise denoted as H,(é,, £2). 





When 

ordinat 
likelihc 
initial» 
each cy 
of neg] 
slowly 
bourhc 
(1956). 
by Sea 


On ¢ 


Usin 
the tw 


which 
On ¢ 


where 
£, too 


With 1 


: Sinc 
f, and 
ge 
ae 


where 
Wit 
the sur 
advan 
of suct 
Newte 
remov 
iterati 
the sol 
initial 
or mo! 
Newt 
additi 
separe 
New 
in the 
equati 


0d, 
rple 
lete 
rom 


the 
tion 


nce 


ling 


, the 


e to 
yula- 





A. CLIFFORD COHEN, JR. 229 


3. ITERATIVE SOLUTIONS 


When solutions to the estimating equations for doubly truncated samples obtained by reading co- 
ordinates from Fig. 1 as described in §2 are not of the required accuracy, the standard maximum- 
likelihood iterative procedure (Newton’s method) may be employed to compute corrections to these 
initial values. However, since the second partial derivatives of the likelihood function must be evaluated, 
each cycle of the iterative process is likely to prove tedious and laborious. Furthermore, as a consequence 
of neglecting second and higher powers of the corrections, Newton’s method tends to produce rather 
slowly converging iterants during the first few cycles unless initial approximations are in a close neigh- 
bourhood of the solution. This difficulty has been recognized and discussed, for example, by Norton 
(1956). In an effort to overcome these objections, the method of successive substitutions as described 
by Searborough (1930, pp. 191—5) has been employed to develop iterants of the form 


£, =f(é1, 3), £2 = 9(Es,&)- (13) 


On clearing of fractions, equations (10) become 


[2,—Z_—£1] = (.—£1) 1 /w, } (14) 
[1+£,2,—£2Z,—(Z,—Z,)"] = (€2—§;)? (s/w)?. 
Using the identity 
{l +£,Z,—§Z,—(Z,—Z,)*J=[1 —(Z,—2Z,—£,) (2, —Z2) /< (E.—£1) Ze], 
the two equations of (14) may be combined to give 
(s/w)? (2 — £1)? + [(Z1 — Ze) 1 /w + Zs] (Eg—£,) —1 = 0, (15) 


which is quadratic in €—&. 
On applying the quadratic formula, it follows from (15) that 


£.—& = {-[(Z,-—Z,) v,/w+Z.] + V(((Z, — Ze) ¥/w + Ze]? + 48?/w?)} w?/(28?), (16) 
where the positive sign is taken with the radical since €,—£, >0. The first equation of (14) is solved for 
£, to obtain ee ie x 

—§, = [2% /w—(Z,—Z,)]/(1—v,/w) (17) 
With this result substituted into (16), it follows that 
£, = (Z,—Z,) +{((Z, —Z_) »)/w+Z_] — J (((Z, — Ze) vy /w + Ze}? + 48%/w*)} w(v, — w)/(28?). (18) 


, Since (17) and (18) are of the forms specified by (13), they may be used to iterate to the required values 
£, and £, by successive substitutions as follows: 


GY = (Z—ZP) + UZP —ZP) vw + ZP] — JAP — ZP) vy /w + ZPP + 4e#/w*)} w(v, — w)/(2s*),) 


' fe ae (19) 
Ee) = — [ES v, /w—(ZP —ZP)]/(1—»,/w), 


where £{ is the ith iterant (approximation) to §;. 

With £9 and £9 determined from the chart of Fig. 1, improved approximations can be obtained by 
the successive application of (19). In many applications it turns out that these iterants result in a rapid 
advance toward the neighbourhood of the solution during the first cycle or two. Thereafter, convergence 
of succeeding iterants slows down as the solution is approached. This behaviour is opposite to that of 
Newton’s method, for which convergence of successive iterants is sometimes slow for values very far 
removed from the solution, but is more rapid in the neighbourhood of the solution. The two methods of 
iteration thus complement each other, and in practical applications where a high degree of accuracy in 
the solution of estimating equations is required, an efficient computational procedure consists of reading 
initial approximations £{” and £ from Fig. 1, advancing to the neighbourhood of the solution with one 
or more cycles of (19), and obtaining final estimates with a single cycle of Newton’s method. In using 
Newton’s method for the final cycle, estimates of the variances a:e obtained simultaneously without the 
additional computational effort which would otherwise be necessary in evaluating the second derivatives 
separately. 

Newton’s method. This method of iteration is based on Taylor series expansions of estimating equations 
in the vicinity of the solution. With values o, and jy designating approximate solutions to the estimating 
equations, 0L/éu = 0 and @L/éo = 0, we write 


h=fMoth, C=o,+k. (20) 








230 Truncated and censored samples from normal populations 
By Taylor’s theorem, neglecting powers of h and k above the first, we have 

aL, ab __ ab 

ONE — Opty 0 Ofy” 














(21) 
eL - eL oL 
O0a, do%R ay 
Corrections h and k are then obtained as the simultaneous solution of (21). 
For a doubly truncated sample, the coefficients of (21), obtained by differentiating (2), are 
ob on _ —— \ 
ano [%-p—0(Z,—Z,)], 
oL on sy i # 
ao = o [{s? + (%@—p)}/o —o(1+ £2, ~_ £222), 
aL n i ss eee 
oy? = — Gall +141 —£242—(Z,—2Z,)], P (22) 
el 22-4) » =» a wi = a 
Cue 7 -< o —(Zy— Is) 1+ 6,2, ~ 62) + 82, ~ B22 |, 
eL 3{s* +(@- h)*} = ad = sa 
ao? -=[* o? —(1+8,Z,—§2,)°+ 8Z,— 2, |, } 


where Z (= x) +,) has been written for the sample mean. 

Newton’s method has been set up here in terms of uw and o rather than £, and &, as for the successive 
substitutions of (19). Thereby, asymptotic variances of the estimates can be readily approximated using 
the second partial derivatives from (22), as evaluated for the final cycle of iteration. This is permissible, 


since these values closely approximate corresponding expected values which would be required for more 
exact calculation of the variances. 


4. DouBLY CENSORED SAMPLES 


In this section we are concerned with doubly censored samples from a population with 
probability density function (1), such that out of a fixed total of N random observations, 
there are n, for which it is known only that 2<42», n, for which it is known only that 
%>X_+w, and n = N—n,—nz, fully measured observations in the interval 2) <a<a,)+w. 
For a sample of this type, the logarithm of the likelihood function is 


L = n,\n[{1—1(€,)]+n,.InI(&,) —nIno— S (2a, — p)?/207 + const. (23) 
1 


This case also was considered by Cohen (1950), and the estimating equations obtained by 
setting 0L/0u = 0 and 0L/d0 = 0 may be expressed as 


[Y%, —¥.—§,)/(E2—£1) —/w = 3 (24) 
[1+£,¥, —£2¥2—(¥Y, —Yo)71/(E2—§1)? — 8?/w? = 0, 
where Y.(é,) = 4 A= “1A - é), ” 
E) on (25) 
Ya(G,) = 32 Fe = ZG), 
with Z(£) designating the reciprocal of Mill’s ratio, i.e. 
= £(&)/1(&). (26) 


Tables of this ratio and of related functions recently appeared as an Editorial (1955) in this 


journal. In these tables, Z(£) is denoted as vai and Z(—£) as Z/Q, with X written for the 
standardized normal deviate rather than é. 








In 
likel: 
true 
obse 
tails 
latte 
mon 


A. CLIFFORD COHEN, JR. 231 


As given above, equations (24) are analogous to those of (10) for doubly truncated 
samples. They differ only in the occurrence of Y, and Y, in place of Z, and Z, respectively. 

In §2 it was pointed out that for doubly truncated samples the method of maximum 
likelihood yields the same estimating equations as the method of moments. The same is 
true in the case of censored samples if we suppose that the n, and n, censored sample 
observations have the same first and second moments as the integrated moments for the 
tails of a normal distribution containing proportions n,/N and n,/N of the whole. This 
latter observation was made by Des Raj (1953), who described his method as one of modified 
moments. 

Since Y, and Y, involve not only é, and &,, but also n,, and n., charts corresponding to 
Fig. 1 do not seem practical in this case. We must accordingly expect to start with less 
(22) _ accurate initial approximations than in the doubly truncated case, and depend on iteration 
for improving these approximations to the required accuracy. 

When n, and n, constitute only a small proportion of the sample, they may be neglected 
to the extent that initial approximations to @, and @, cai. be read from Fig. 1 as for a doubly 
truncated sample. When n, and n, are appreciable proportions of the total sample, better 
| first approximations might be read from tables of the normal curve areas using the relations 


(21) 











‘ fs 

using m __(* goa, —™ — 2 [° gaan, (21) 
sible M+Ng+n J-w Myt+Ngt+n Jz, 

more 


For reasonably large samples, the two sets of initial approximations suggested above should 
be in fairly close agreement, and either set might be satisfactory as a starting point in the 
iterative process. When appreciable differences exist between the two sets of values, the 


with | Choice of starting point might appropriately be based in some measure on the magnitude of 
ions, | 71 and ngrelative to the total sample size as already mentioned. Under some circumstances 
th at an average of the two might be preferred. 

+w 


Iteration procedures given in § 3 for doubly truncated samples also apply here with slight modifica- 
| tions. The iterants of (19) are applicable in this case when Y, and Y, are substituted for Z, and Z, 
respectively. Likewise Newton’s method is applicable where the coefficients of (21) are obtained from 











(23) | (23) by differentiation. With Y, ar. . “”, defined by (25) and Z(£) defined by (26), these coefficients are 
d by 2 2 * (@—p—o(¥,-Ya) : 
| So =~ galt +m 2 —Ey) +ma2Ea)h (28) 
(25) ea 5 | Ont ema |, 
Sega] EO ne mel —8y + mtiés | 
(6) } where ZE) = Z(E) (AE) a| 
AE) = Z(E)+£2'(8), (29) 
= C(E) = ELE) +.AB)]- 


As written above, Z’ is the derivative of Z with respect to &. 








232 Truncated and censored samples from normal populations 


5. SINGLY CENSORED SAMPLES 


For a singly censored sample in which, out of a fixed total of N observations, there are n 
measured observations for which x > x, and n, = N —n unmeasured observations for which 
it is known only that «<p», the estimating equations as given by Cohen (1950) are 


u-&F-2y/( (Y—2)*—»,)o3 = 0, 
o = ca é), (30) 
i= 


These results follow from (24) when we let ioe and subsequently drop the subscripts 
from 61 and Y,. The first equation of (30) can be solved for . Then & follows from the second, 
with #@ being determined from the third as in the doubly restricted cases. 

If, however, (Y- é) is eliminated between the first two equations of (30), we thereby 





bt 
— cn (31) 
(jer yy ° 
and it follows that e. nate ta J (32) 
V6 


Upon substituting this value for é into the third equation of (30), we have* 
fi = %y— (6? —v2)/V. (33) 
Thus as an alternate procedure for calculating estimates, we might solve the first equation 
of (30) for [7 . £) rather than for @, then calculate & using the second equation of (30) as 
before, and calculate 7# from (33). 
Since Y() = (n,/n) Z(—&) in the notation of equation (26), the Biometrika Editorial 


tables (1955) or tables of normal curve areas and ordinates permit ready evaluation of this 
function, and it has been possible to prepare a chart, included here as Fig. 3, of the family 


of curves : 
v,/rt = [1-E(¥ —£)]/(Y —£)*, 

plotted as functions of € for the various values of h, where 

h = n,/(n+,). (34) 
To the extent of the range covered, tables of Hald (1949) were employed in plotting these 
curves. For values beyond the range of Hald’s tables, computations were based directly on 
tables of the normal curve. In using the chart of Fig. 3, v,/vj and h are computed from the 
sample data, and with these values known, é is - along the horizontal scale. When more 
accurate results are required, [1 — &( Y — £)]/( Y — &)? can be calculated for additional values 


of € so that a more accurate determination of 3 or of (¥- £) can be obtained through 
interpolation. 


The function (Y — £) has been tabulated by Gupta (1952) as a function of p and wy, where 
p=1-h and yY=s8?/v,. 


Corresponding to observed values of p and y, (Y — £), which Gupta denotes as z, is obtained 
by interpolation from his table. Estimates of the population standard deviation and mean 


* This result was given in an equivalent form by Gupta (1952). Expressed in the notation of this 
paper, his estimator may be written as 2 = % +(G*—s*)/(%—2,). 














———_ — 


are t 
proc 
but < 


co 


of 


A. CLIFFORD COHEN, JR. 233 


are then calculated from the second equation of (30) and from (33) respectively. Gupta’s 
procedure accordingly enjoys the advantage of requiring interpolation in only one table, 



























































































PO % but at any rate in parts of his table his tabular intervalsseem toolarge for easy interpolation.* 
rich 
3-2 T T T 
A oi ad le 
30 ee 
39) 3-0 
' 2-95 | ah oe 
ipts E Estimation curves for 
md 26F- 
, . singly censored samples 
2-7 -—_—+t ened SS 1 1 } } 
eb | 
Mg we: oa we Sa 1 
(31) oe 
2-4 ecu meek eae * +— —* a a 
(32) YR eee ee We 
. ttt 
(33) =" 24 3 +— 
tion 205 ‘ 
) as wf {i tt 
rial is a f 
this 1-75 
nily Po on 
1$f—-——- 
14 
(34) a oe 
1ese 1 oF oo = 
y on E 
the ”" wae 
Lore "es a 
lues | E 
ugh Fig. 3 
here 
6. ILLUSTRATIVE EXAMPLES 
Doubly truncated sample. To illustrate estimation in the doubly truncated case, we 
ined consider an example in which the entire production of a certain bushing is sorted through 
ean go, no-go gauges, with the result that items of diameter in excess of 0-6015in. and those 
this [* It seems to be worth considering whether this disadvantage might be overcome by an expansion 


| of Gupta’s table, using finer tabular intervals. Ed.] 








234 Truncated and censored samples from normal populations 


less than 0-5985in. are discarded. For a random sample of 75 bushings selected from the 
screened production, 


# = 0-600 14933in., s* = 0-000000371187, 2) =0-5985 and w = 0-0030. 
Thus Vv, = F—x, = 0-001 64933, v,/w = 054978, s?/w? = 0-041 242, 
and visual interpolation between the curves of Fig. 1 gives £ = — 2-50 and £9 = 2-00 as 
initial approximations. These values might conceivably be accurate enough for the purpose 
of this sample. However, as a demonstration of the iterative processes described in § 3, we 
proceed to determine more accurate solutions of the estimating equations using those 
methods. Accordingly, two cycles of the iterants of (19) yield 
&) = 1-997, 9 =—2-522 and £=1-997, £2 = —2-525, 
respectively. For final estimates, we employ Newton’s method with 
oy = w/(EP — E) = 0-0030/(1-997 + 2-525) = 0-000 663 42 

and [ig = 0°5985 — (0-000 663 42) (— 2-525) = 0-600 175 14. 
With the derivatives of (22) evaluated for these values of ~) and a (€, = — 2-525, , = 1-997), 
corrections h and k are obtained by solving (21) which for this sample becomes 

— 150706 576h+ 27235 297k = —7-076 300, 

27 235 297h —187700371k = 75-019711. 


Thereby we obtain h = — 0-000 000 03, = — 0-000 000 40, and as final estimates we have 


fi = 0-600 175 14— 0-000 000 03 = 0-600 175 11, 
& = 0-000 663 42 — 0-000 000 40 = 0-000 663 02. 


Since coefficients of the correction equations in h and k are approximately equal to 
expected values of corresponding second partial derivatives of L, the asymptotic variance- 
covariance matrix of f# and 6 may, with very little additional effort, be approximated as 


150706576 —27235297\-1! a 5:471x10-9 —98-87 x10-® 
— 27 235 297 187 700 371 ~ \— 98-87 x 10-9 6-814 x 10-9) ° 

Thus, 

V(z) ~ 0-000 000 005 471, V(é) ~ 0-000 000006814 and cov (2, é) ~ — 0-000 000 098 87. 


Doubly censored sample. In a certain time-mortality experiment, the first observation is 
delayed until the elapse of a fixed time interval, with the result that n, sample specimens 
die before observation begins. The experiment is subsequently terminated at a predeter- 
mined time with n, specimens remaining alive. Actual survival times are recorded for then 
specimens which die during the period of full observation. For a specific sample of this type 
in which zx designates the logarithm of survival time in days and is assumed to be normally 
distributed (ju, a), 


my =2, n=40, n,=5, xX = 1:301030, w= 0-602060, ZX = 1-620111, 
v, = 0319081, s? = 0-0217392, v,/w = 0-529982 and s/w? = 0-0599741. 


Neglecting knowledge of n, and n., we read £, = — 1-71 and & = 1-36 from Fig. 1. Sub- 
sequently we apply (27) to obtain £, = — 1-72 and &, = 1-25 from tables of normal curve 





————— 


area 
the 
Za1 


On 
ha 


off 
of 


the 


0 as 
208e 
, we 
10se 


97), 


ve 


ub- 
rve 





A. CLIFFORD COHEN, JR. 235 


areas. The two sets of readings are in reasonably close agreement, and we decide to begin 
the iteration with initial values of & = — 1-72, & = 1-28. The iterants of (19) with Z, and 
Z, replaced by Y, and Y, respectively are employed to obtain the results tabulated below. 








i Eo gw 
} = es Bt: 
0 — 1-72 1-28 
1 — 1-681 1-278 
2 — 1-689 1-282 
3 — 1-688 1-281 
4 — 1-688 1-281 


As a demonstration of Newton’s method in this case, we start with the results obtained 
above at the end of the second cycle, that is, with £& = — 1-689 and £P = 1-282. Accordingly 
> = 0-602 060/(1-282 + 1-689) = 0-202 646 

and [og = 1-301 030 — (0-202 646) (— 1-689) = 1-643 299. 


With derivatives of (28) calculated for these values of 1) and a), equations (21) become 


—1117-:362555h+ 52-:976746k = 0-009 567 35, 
52-976 746h + 1791-254 507k = — 0-234 567 59. 


On solution, we obtain h = — 0-000 002 36 and k = 0-000 13089, and as final estimates we 
have 

hi = 1-643 299 — 0-000 002 = 1-643 297, 

o = 0-202 646+ 0-000 131 = 0-202 777, 


which correspond to 2, = — 1-68790 and &, = 1-28122. It is to be noted that when rounded 
off to three decimals, these two latter values agree with those obtained using the iterants 
of (19). 

Using coefficients of the above correction equations, the asymptotic variance-covariance 
matrix of #@ and @ may be approximated as 


1117-3626 —52-9767\—* | 0:0005591 —9-0000265 
— 52-9767 1791-2545) ~— \—0-0000265 0-000 896 2) ° 


Thus we have, 
V(2)~0-0005591, V(é)~0-0008962 and cov(f,é)~ —0-000 0265. 


Singly censored sample. To illustrate estimation from a singly censored sample, we 
consider one for which 2, = 70-00, n = 50, n, = 3, v, = 10-654, v, = 145-2426. Thus, 
h = 3/(3+50) = 0-05664 and p,/v? = 1-27958. Reading from Fig. 3 we have’ = — 1-56 as 
a first approximation. Since the computing routine is simple, the accuracy of this result 
can readily be improved by calculating additional values of [1 —&(Y —£)]/(¥ —£)? using a 
table of normal curve areas and ordinates, and then interpolating as summarized below. 








236 Truncated and censored samples from normal populations 


When the Biometrika Editorial tables (1955) are available, the required values of Z(—&) 
can be read directly without the necessity of the additional calculations involved in using 
normal curve tables. 





| sl 4 bi 
- Y-§ [1—&( ¥ —&)](Y-£)? 

— 1-560 1-67939 1-28347 

— 1-569 1-68899 1-27958 

— 1570 1-68990 1-27921 











With ~ = — 1-569 and (¥ —2) = 1-68899, we estimate o and yw from the second and third 
equations of (30) as 
e 


10-654/1-68899 6-308, 
fi = 70-00 — (— 1-569) (6-308) = 79-90. 


Alternately, we might have estimated ~ from (33) to obtain the same value as above 


fi = 70-00 — (6-308? — 145-2426)/10-654 = 79-90. 


REFERENCES 


Couen, A. C., Jr. (1950). Estimating the mean and variance of normal populations from singly 
truncated and doubly truncated samples. Ann. Math. Statist. 21, 557. 

ConHen, A. C., JR. & Woopwarp, JoHN (1953). Tables of Pearson—Lee—Fisher functions of singly 
truncated normal distributions. Biometrics, 9, 489. 

Ep1roriAu (1955). The normal probability function: Tables of certain area-ordinate ratios and their 
reciprocals. Biometrika, 42, 217. 

Gupta, A. K. (1952). Estimation of the mean and standard deviation of a normal population from 
a censored sample. Biometrika, 39, 260. 

Hatp, A. (1949). Maximum likelihood estimation of the parameters of a normal distribution which is 
truncated at a known point. Skand. AktuarTidskr. 32, 119. 

Norton, H. W. (1956). One likelihood adjustment may be inadequate. Biometrics, 12, 79. 

Raj, Des (1953). On moments estimation of the parameters of a normal population from singly and 
doubly truncated samples. Ganita, 4, 79. 

Scarsorovuen, J. B. (1930). Numerical Mathematical Analysis. Baltimore: Johns Hopkins Press. 

TxHomson, G. W., FrrepMan, M. & GarRELIs, E. (1954). Estimation of mean and variance of normal 
populations from doubly truncated samples. Report of Ethyl Corporation Research Laboratories, 
Detroit. 








-§) 


ing 


ird 


nal 
168, 











[ 237 ] 


UPPER PERCENTAGE POINTS OF THE GENERALIZED 
BETA DISTRIBUTION. I 


By F. G. FOSTER anp D. H. REES 


Research Techniques Unit, London School of Economics 
and Rothamsted Experimental Station, Harpenden 


1. IynTRODUCTION 


In sampling from k-variate Normal populations, if A and B are independent estimates, 
based on v, and v, degrees of freedom, of what on a null hypothesis is the same dispersion 
matrix, the roots of the determinantal equation 
|B-—AA|=0 
are invariant under all linear transformations of the k variates. They are thus unaffected, 
for example, by a change in the unit of measurement, and so are independent of the magni- 
tudes of the parent variances and covariances. There are clearly advantages in testing the 
null hypothesis by means of some function of the roots, and arguments for their use have 
been advanced (Hotelling, 1947, 1951, 1954; Roy, 1945, 1950, 1953, 1954). Among the 
functions proposed* are: (i) the product of the roots (Wilks, 1932; Pearson & Wilks, 1933); 
(ii) the sum of the roots (Hotelling, 1947, 1951); (iii) the greatest (or least) root (Roy, 1950, 
1953, 1954). We are concerned here with the tabulation of the greatest root, for the case, 
k = 2. The computations are based on a recursion formula of Roy (1945). Similar tabulations 
for the cases k = 3, 4 and 5 are in hand. 
If A is based on v, and B on v, degrees of freedom, the k roots of 


| »B-0(v,4 +¥_B)| = 0 
all lie between 0 and 1 and are related to the roots A of the previous equation by 
_ VaA 
Vyt+Vga° 





Thus the roots A are immediately obtainable from the roots # and vice versa. The relation 
corresponds to that which exists between the F and the Beta distributions in the univariate 
case. We have tabulated the 80, 85, 90, 95 and 99 % points of the distribution of the greatest 
root #7 on the null hypothesis of identical parent dispersion matrices. The tables extend 
previous tables by Pillai (1956) and Nanda (1951). 

The joint distribution of the k roots, 0,, 9, ...,9;, has been given by Fisher (1939), Hsu 
(1939), Roy (1939) and Girshick (1939) (cf. also Mood, 1951). The density function is 


k 
K [| (1 -0,)" 1] (0,-9;) (0<0,<...<O, <1), 
i=1 


i>j 


k i 2 ee 
where Kua ; SG t ths: 2) 5 
i=1 D($ 2p +4— 1) (4 2g +4— 1) M(H) 
and p=v,—-k+1), g=4(¥y,—-k+1). 


* The question as to the relative merits of these functions in different circumstances does not appear 
to have been fully resolved, and will not be taken up here. A possible approach will be to carry out 
distribution sampling experiments to obtain the power functions of the tests against various alternatives. 








238 Upper percentage points of the generalized Beta distribution. I 
Thus the distribution function of the greatest root is given by 


x 6, 62 k 
I,(k; po) = KI" dO) dOy_1..-[a6, 1102-41 — 8,9" TH (0, — 0) 
0 0 0 i=1 i>j 
It will be seen that [,(k; p,q) reduces to the Beta distribution [,(p,q) when k = 1, and for 
this reason we have proposed the name ‘Generalized Beta distribution’ for it. 


2. INTERPOLATION 


For use of the tables in connexion with tests of significance, the value of x will be required 
for the percentage levels tabulated and for integer values of vy, = 2p +1, vy, = 2¢+1. Linear 
interpolation, either p-wise or q-wise, within the body of the tables will usually suffice for 
accuracy to at least two decimal places. For greater accuracy 3- or 4-point Lagrangian 
interpolation should be used. Extrapolation beyond v, = 21 (p = 10) is not possible, and 
the necessity for this is not expected to arise. For q-wise interpolation between v, = 161 
(q = 80) and pv, = 0, 3-point harmonic interpolation, based on gq = 40, q = 80, g = 00 may 
be used. 

Example. Find the 95 % point corresponding to vy, = 13 (p = 6) and v, = 403 (q = 201): 

















} 

Vy q 1/q x | Ax Ax 

- —_ errr a | 
oO 00 0 0 | 

161 | 80 1/80 | 0-1506 poo — 304 
81 | 40 1/40 0-2708 
| 
| 











The values, 1/q, are equally spaced, with interval 1/80, and we require the value of x corre- 
sponding to 1/¢ = 1/201. Thus 


1/201 (1/201) (1/201 — 1/80) 
1/80 “*°+ ~~ aca 78072 


80 80 x 121 
= 5) (0°1508) + 5 


2 x 201? 
= 0-064, 


x=0O+ 








A*xy 





(0-0304) 


which is correct to 3 decimal places. 


3. USES OF THE TABLES 


We illustrate two typical applications to tests of significance discussed by Roy (1945, 1950): 
the analysis of dispersion (i) of means and (ii) of regression. In both cases the appropriate 
critical region is the upper tail of the distribution (cf. Rao, 1952, chapter 7). 

It should be noted that a direct application to testing the equality of two dispersion 
matrices, when we do not know which should be the ‘larger’, would require the joint 
distribution of the greatest and least roots. This would correspond to the univariate test 
in which we use both tails of the F (or Beta) distribution as critical region. Since the greatest 
and least roots are not independently distributed, these tables are not appropriate for this 
type of test. 








or 








F. G. Foster anp D. H. REEs * 239 


(i) Analysis of dispersion of means 


Let us select samples of sizes n,, 79, ...,n, from r bivariate Normal populations, having 
the same dispersion matrix. It is required to test the hypothesis that the r mean vectors 
of the populations are equal. Let n,+n.+...+n, = N. The combined sums of products 
matrix (S) based on N — 1 degrees of freedom, may be analysed into sums of products due 
to the means (Q) with r—1 degrees of freedom and sums of products due to the error 
(W) with N —r degrees of freedom: 


S=Q+W. 


Then Q and W, divided by their degrees of freedom, are independent estimates of the 
same parent dispersion matrix, and our test consists in computing the greatest root of 
|Q—O8 | = 0, and entering the table with vy, = N—r, vy, = r—1. 

Example. Table 1 gives the analysis of dispersion for the two characters x, = log,, (tooth 
length) and x, = log, (maximum breadth) of the permanent upper second premolar for the 
three male groups, human (West African), chimpanzee and orang-outang. The groups have 
been selected from a larger body of data kindly made available to us by the authors (Ashton, 
Healy & Lipton, 1957). The logarithmic transformation was used to stabilize the within- 
group variance. Then 

















Q= 0-544941 ytrevitaet a 0-682727 0-595110 
~ \0:525768 0-509075) ~ \0-595110 0-601867)° 
Table 1 
| Sums of products matrix 
Degrees of | 
| | freedom | 
| | x3 | 1X x 
| | | | 
-————| x | | 
Between groups | 2 Qi; | 0-544941 | 0-525768 0-509075 
Within groups | 154 W,; | 0137786 | 0-069342 | 0-092792 
| 
| Total | 156 Sis 0682727 0-595110 | 0-601867 | 
| 
| 








The roots of | Q—0S| = 0 are now obtained by solving the quadratic 


(0-544941 — 0 0-682727) (0-509075 — 0 0-601867) 
— (0°525768 — 0 0-595110) (0-525768 — 6 0-595110) = 0. 


We find that 0, = 0-020238, 0, = 0-856543. 


With v, = 2 and vp, = 154, O,,x. = 0°857 is seen to be highly significant, since from the 
tables we see that the 99% point for v, = 2, vy, = 121 is 0-096, and for v, = 2, vy; = 161 it 
is 0-073. 

The group means are given in Table 2. 








240 Upper percentage points of the generalized Beta distribution. I 


For the sake of comparison we illustrate the alternative test criterion based on the 
product of the roots, using these data. The statistic A = | W |/| S| is equal to the product 


k 
II (1-—46,). This criterion is arrived at by the likelihood ratio method by Pearson & Wilks 
i=1 


(1933), and for k = 2,,/A has exactly the Beta distribution J,(v, — 1,v,) on the null hypothesis. 
Applied to the above data, this gives A = 0-140554. Therefore ,/A = 0-375, and on the null 
hypothesis that neither x, nor x, differs from group to group, ,/A should be distributed 
according to [,(153, 2). The lower tail of the distribution being here the critical region, this 
is again highly significant, since Pr (,/A < 0-375) = 0-65 x 10-®. (Another example is given 
by Pearson & Wilks, 1933, p. 370.) 








Table 2 
eS 5) RENE ieee = ; 
No. in group | X Xe | 
Human (West African) 59 | 1-846 1-981 | 
Chimpanzee 55 | 1-865 2-008 
Orang-outang 43 1-986 2-119 | 
| 








(ii) Analysis of dispersion of regression 


Let 2, % 9, ..., %s, %g44>Ugy9 be (8 +2) correlated variables for which a combined sums-of- 
products matrix (x;;), based on v degrees of freedom, is obtained. It is required to test the 
hypothesis that the linear regression of ,,,, 7,,. on the other s variables, 


6 (Xo41) = Db %;, 
i=1 
; 8 
6 (542) = Dine, 
i= 


has zero regression coefficients, ie. that the variation in x,,,, %,,. is independent of that 
due to the other variables. 


The sums of products matrix for 2,, ..., x, is 
(? oe *) 
SS 
le 
Let the inverse be : . 
a x 


Then the sums-of-products matrix for x,,,, %,,. due to 2, ..., 2, is 


11 1s 
1,s+1 1,8+2 
Q ‘ogee ne igh . ° 


eee 
1,842 x 1 88 
vw oe Xs sit Vs,s+2 











anc 


we 


TI 


F. G. Foster anp D. H. RreEs 241 
he On the null hypothesis Q yields an estimate, with s degrees of freedom, of the dispersion 











ict matrix of x, , 1, 7,9. The sum of »roducts matrix for x,,,, 7,;. due to the error, after correcting 
= for X,,...,%,, is NOW 
W = ‘iyiegae Phot ~@ 
is. Xo42,84+1 X42, 8+ 
= This yields an estimate based on v—s degrees of freedom, which, on the null hypothesis, 
“ | _ is independent of the previous one. Therefore 
is 
en S = oo" nen 
%s12,8+1 Ts+2,8+2 
and our test consists in computing @,,,,. for 
|\Q-O@S| =0, 
and entering the table with v, = v—s, v, = s. 
Example. The following five measurements were made on the permanent upper first 
incisor of 83 female gorillas. 
y, = basal width, 
Y2 = Maximum width, 
Y3 = basal thickness, 
Y, = maximum height, 
of. y; = thickness at maximum width. 
he This data is again taken from the same source acknowledged in the previous example. 
g g P 
: For the reason given there the logarithmic transforms of these measurements, 7; = logy y;, 
were taken and the following sums-of-products matrix (divided by 82) obtained: 
vy Xo v3 U4 v5 
13-03 
5:77 12-36 
10-4 4-90 8-33 11-88 
- 3°83 39-14 28-38 229-36 


—-195 -—4475 -—. 95 —261-52 388-31 
| It is required to test the significance o. ‘ near regressions: 

E (4) = 044%, + dogg + bg4X3, 

E (a5) = b55%, + bo5 29 + O53. 


The inverse of the sums-of-products matrix for x,, 7, and x; is 


0-098298 
104 | — 0-035196 0-165996 = A-1, 
—0-015965 —0-101876 +40-162152 

The sums-of-products matrix due to the regressions is given by 
3°83 — 1-95 
3°83 39-14 28°38 146-01 
l —8 -1 QO. — . = —4 = ° 
. bg 1:95 —4475 — isu “ | Ce . 5 ™ s 169-64 shite Q 
28:38 —30-95 
| and has 3p.F. 


\ 16 Biom. 44 








242 Upper percentage points of the generalized Beta distribution. I 
The total sums-of-products matrix before the regression is 


229-36 


_ —4 
ran Beove 388-31 


with 82D.F. 


The greatest latent root of |Q—OS| = 0 is Onax, = 0-637. With v, = 3 and v, = 79 this is 
seen to be highly significant, since from the tables we see that the 99 % point for v, = 3 
and v, = 71 is 0-186, and for v, = 3, v, = 81 it is 0-165. 


4. METHOD OF COMPUTATION 


The computations were carried out on the N.R.D.C. Elliott 401 Computer at Rothamsted 
Experimental Station. We restrict consideration below to the case k = 2 for which the 
tabulations have been made. Let 0,,,,. denote the greatest root of 


| v,B—6(v,A+v,B)| = 0, 


where A and B are independent estimates, based on v, and v, degrees of freedom, of a parent 
dispersion matrix of a bivariate Normal population. Define 


T,(2; p,q) = Pr {Omax. <2}, 
where p = }(v.—1), g = 4(v,—1). Pillai (1956) gives a formula which may be written 
(2; Pp, q) a K{2B,(2p, 2q) ee xP(1 at «)? Bi(p, q)}, 


-_ mI(pt+gl(p+qt4) 
h K = 50 "NCE 
where P(p) T(p+4)0(q@ P(q+4)’ 


and B,(p,q) is the Incomplete Beta function. As it stands, this formula is not suitable for 
computation. If, however, we distribute the normalizing constant, K, we obtain the 


formula 
L,(2; Pp; q) = I,(2p, 2q) ar Lp; q) a,(p, q); 


where [,( p,q) is the Beta distribution, and 


ad mD(p+q+4) 
NE eT e+ pier 


[.({ p,q) may be computed recursively for integral values of p and g by means of the relations 
(0, q) = 1 (q = 0, 1, 2,...), 
L,(p, 0) =0 (p = 0), 
I,(p,4) = xI,(p—1,q)+(1—2)I(p,q-1) (p,q>9), 


and a,(p,q) may be computed by means of 


a,(0,0) = 1, 

a,(p, 0) = xa,(p—1,0) (p> 0), 

a,(0,q) = (l—x)a,(0,q—1) (q>9), 
a. 

a,(p, q) si ( + 2(p+q— 1) {xa,,(p— 1,q)+ (1 —«x)a,(p,q—- 1)} (p,q> 0). 











is is 
=% 


sted 
the 


rent 


e for 
the 


tions 











F. G. Foster anp D. H. REees 243 


By means of these relations, we computed tables of the distribution function [,(2; p,q) 
for « = 0-01 (0-01) 0-99 and for the ranges p = 1(1) 10, g = 2(1) 20(5) 50, 60, 80. A copy of 
these tables is available in the Library of the Royal Statistical Society. The percentage 
points given below were obtained, in the main, by inverse interpolation on these tables, 
by means of a desk calculating machine. The values have an error of less than two units in 
the fourth decimal place, and in general* this error is less than 0-5. Separate computations 
were made later to obtain the values corresponding to p = 0-5. 

Since for fixed x and p, I,(p,q)—> 1 as g->00, we have the approximation for small p, 


I,(2; p,q) =1—a,(p,q). 


Within the range of p and q used, we found empirically that this formula was applicable 
when J,(2; p,q) was greater than about 0-92. The computation was switched to the approxi- 
mation whenever it became applicable, which substantially increased the speed. 

It will be seen from the above relations that the natural way to compute (2; p,q) would 
be not for the range of x for each p and q, as is required for tables of the distribution function, 
but for fixed x recursively through for p and q. To have obtained a final tabulation of the 
distribution function by this method we should have had to store the information consisting 
of about 26,000 numbers. This was beyond the internal capacity of the machine, and examina- 
tion of a scheme to output the information for subsequent editing on the machine showed 
that the method would have seriously reduced the advantage of the more direct method of 
computation. 

The tables were therefore prepared on the basis of a fixed gq for all p and then all x. The 
disadvantage that the same values of [,( p,q) and a,( p,q) have to be repeatedly computed 
was to some extent offset by the gain in flexibility of the programme in that (i) the calcula- 
tions could be restarted at any values of x, p, g, in case of any machine failure and (ii) it was 
possible to modify the programme easily at a later stage in order to check directly the 
accuracy of the percentage points obtained by inverse interpolation in the tables. 

The tables were printed to four decimal places directly on the typewriter output in a 
normal table format: each page consisting of the distribution for one value of q, arranged in 
ten columns corresponding to p = 1(1)10 and with x = 0-01 (0-01) 0-99 in groups of five. 
To economize on the output time, the following controls were included in the programme. 
The fact that [,(2; p,q) is monotonically decreasing for increasing p was used to avoid 
printing zeros as soon as (2; p,q) became less than 0-00005, nothing more was then printed 
on that line and a fresh calculation started with x increased by 0-01. Whenever J,(2; 10, q) 
became greater than 0-99995 the calculation was re-started with the next value of ¢ and 
z=0-01. 


5. FURTHER WORK 
In the light of the experience gained with this computation, we have decided to forego the 
tabulations of the distribution functions in the projected computations for k = 3, 4 and 5, 


but to obtain the percentage points directly, in one automatic computation, utilizing the 
natural order of computation and using a method of inverse interpolation. 


These tables were made in consultation with Prof. 8. N. Roy, for whose advice the authors 
wish to express their thanks. 


* The largest error is at the 99% points for ¢< 20. 








244 Upper percentage points of the generalized Beta distribution. I 


REFERENCES 


Asuton, E. H., Lipton, 8S. & Hraty, M. J. R. (1957). The descriptive use of discriminant functions 
in physical anthropology. Proc. Roy. Soc. B (in the Press). 

FisHeEr, R. A. (1939). The sampling distribution of some statistics obtained from non-linear equations, 
Ann. Eugen., Lond., 9, 238-49. Also published in Contributions to Mathematical Statistics, paper 36, 
1950. New York: John Wiley and Sons. 

Grrsuick, M. A. (1939). On the sampling theory of the roots of determinantal equations. Ann. 
Math. Statist. 10, 203-24. 

HorTe.iine, H. (1947). Multivariate quality control illustrated by air testing of sample bombsights. 
From Selected Techniques of Statistical Analysis, ed. Eisenhart et al. New York. 

Hore.uine, H. (1951). A generalized 7 test and measure of multivariate dispersion. Proceedings 
of the Second Berkeley Symposium on Mathematical Statistics and Probability. Berkeley : University 
of California Press. 

Hove ine, H. (1954). Multivariate analysis. From Statistics and Mathematics in Biology, ed. Kemp- 
thorne et al. Iowa: Iowa State College Press. 

Hsv, P. L. (1939). On the distribution of roots of certain determinantal equations. Ann. Eugen., 
Lond., 9, 250-8. 

Moop, A. M. (1951). On the distribution of the characteristic roots of normal second-moment matrices. 
Ann. Math. Statist. 22, 266-73. 

Nanpa, D. N. (1951). Probability distribution tables of the largest root of a determinantal equation 
with two roots. J. Indian Soc. Agric. Statist. 3, 175-7. 

Pearson, E. 8. & Wirxs, 8S. S. (1933). Methods of statistical analysis appropriate for k samples of 
two variables. Biometrika, 25, 353-78. 

Priuuat, K. C. 8. (1956). On the distribution of the largest or the smallest root of a matrix in multi- 
variate analysis. Biometrika, 43, 122-7. 

Rao, C. R. (1952). Advanced Statistical Method in Biometric Research. New York: John Wiley and Sons. 

Roy, 8S. N. (1939). p-statistics, or some generalization in analysis of variance appropriate to multi- 
variate problems. Sankhya, 4, 381-96. 

Roy, S. N. (1945). The individual sampling distribution of the maximum, the minim«m, and any 
intermediate of the p-statistics on the null hypothesis. Sankhya, 7, 133-58. 

Roy, 8S. N. (1950). On some aspects of statistical inference. Proc. International Congress of Mathe- 
maticians, pp. 555-64. 

Roy, 8. N. (1953). On a heuristic method of test construction and its use in multivariate analysis. 
Ann. Math. Statist. 24, 220-38. 

Roy, 8. N. (1954). A report on some aspects of multivariate analysis. Mimeo. Series no. 121, Institute 
of Statistics, University of North Carolina. 

Wits, 8. 8. (1932). Certain generalizations in the analysis of variance. Biometrika, 24, 471-94. 








11 


13 


1' 


Th 











} 











F. G. Foster anp D. H. REEs 245 
Generalized Beta distribution: 100P % points for x 
pe | | ) : aa 
el o@ 3 5 7 9 11 | 13 | 15 17 19-21 
he | 
"1 ‘Kd | | | | 
a | | | | | 
P | | | | | | | | | | 
5 0-80 | 0-7011 | 0-7728 | 0-8454 | 0-8825 | 0-9052 | 0-9205 | 0-9315 | 0-9398 | 0-9464 | 0-9516 | 0-9559 
“85 | ‘7449 | “8075 | 8696 | 9013 | -9205 | -9334 | pov id *9497 -9552 | -9596 | -9632 
-90 | 7950 | -8463 8968 | *9221 | -9374 | -9476) 9605 | -9649 | -9633 | -9712 
‘95 | +8577) -8943 | -9296,| -9471 ‘9576 | -9645| -9696 | -9733 | 9763 | “9787 | “9806 
ad 9377 *9542 | “9698 | -9774  -9819 | pit 9888 | “9000 | 9910 | 9918 
| | | | | | 
7 0-80 | 0-5638 | 0-6469 | 0-7416 | 0-7954 | 0-8303 | 0-8550 | 0-8734 | 0- 8876 | 0-8989 lo 9082 | 0-9158 
“85 | 6851 | 7712 | -8194 | -8507 8726 | *8889 | -9014 | -9114 | 9196 | -9264 
-90| -6628| -7307 | “8058 | +8474 | “8741 | *8928 | 9066 | 9173 | *9257 | 9326 | -9383 
“95 7370 | ‘7919 | -8514) -8839 | -9045) -9189, -9295| -9376)| -9440| -9493)| -9536 
‘99 -8498 | -8826 -9173 -9358 *9475 | 9556 | *9615 | -9660 | 9695 | is ‘9748 
| | 
| | 
9 0-80 | 0-4688 | 0-5526 0: 6558 | 0-7189 | 0-7619 | 0-7933 | 08173 0- 8363 | 0-8517 | 0: *8644 | 0-8751 
*85 | -5108, -5903 | 6869 | ‘7452 | -7848| -8136) -8354| -8527 8606 | “8782 | -8878 
.90 | -5632 | 6366 | “7244 | -7768 | *8120 | +8374 | 8567 | 8720 | *8842 3943 | -9027 
95 | +6383 | -7017 | -7761 | 8197 | -8487 | -8696 *8853 | -8976 | “9076 | 9157 | +9225 
‘99 | -7635 | -8074 -8575 | 8862 | 9051 | -9185 "9286 | ‘9364 | | wis. -9521 
11 0-80 | 0-4003 | 0-4810  0-5859 | 0-6536 0-7016 | 0-7376 | 0-7657 | 0-7884 | -8069 | 0- 8225 | 0-8357 
“85 | +4389 | 5169 | 6169 | -6808 -7257 | -7592)| -7854) -8063 | 8235 | .8378 | -8500 
-90| -4880 ‘5617 | -6551 | -7138 7548 | -7854 | -8089 | “8278 | 8433 | *8561 -8670 
pee +5603 *6267 | -7091 “7600 | 7952 | *8212 8413 | 573 | “8702 | *8810 | -8902 
99 | 6878 | *7381 -7989 anes s *8607 | -8790 | -8929)| -9039 | “9128 | -9202 | -9265 
13 0-80 | 0-3490 | 04253 0-5286 | 0-5982 | 0-6489 | 0-6880 | 0-7190 | 0-7443 | 0-7654 | 0-7832 | 0-7984 
“85 | *3843 | -4590| +5589) -6252 6735 | *7103 | -7395 | +7632 -7829 | -7996 | -8138 
-90| -4298 ‘5016 | -5965 | -6587 ‘7035 | +7375 | -7644 “7862 | *8042 | -8194 +8324 
“95 | -4981 | -5646 | -6507 | -7063 7459 | “7757 -7992 ‘8181 | -8337| -8468) -8580 
.99| -6233| -6770| -7446| -7872| -8171| -8394| -8568 | -8706| -8821| -8915 | -8997 
| | | | 
15 0-80 C 3091 | 0-3809 | 0-4812 | 0-5508 | 0-6030 | 0-6439 | 0-6769 | 0-7042 | 0-7271 | 0-7467 | 0-7636 
85 | 3415 | -4124 +5102 °5775 | -6275 -6664 -6978 | -7236 *7453 | *7638 | -7798 
“90 3837 | +4527 5468 | -6106 | -6577 -6942 | -7235 | -7475 *7675 -7847 | -7993 
95 -4478 | +5130) 6003 | *6584 -7011 | -7338 | -7598 -7810 | -7989| -8138 -8268 
“99 -5687 *6237 | -6954 | -7422 ‘7758 | -8013 *8216 *8378 | "8512 | +8629 +8726 
17 0-80 | 0-2774 | 0-3447 | 0-4412 | 0-5101 | 0-5628 | 0-6047 0-6389 | 0-6676 | 0-6919 | 0-7129  0-7312 
‘85 | -3072| -3741 | -4690 | -5360| -5869  -6272 -6600 -6874 | -7105 | -7304  -7478 
90 | -3463 | -4122 -5043 -5685 ‘6169 | -6550 -6860 | -7116 7334 *7519 ‘7681 
‘95 | +4065 -4697 +5564 -6160 | -6605 -6951 *7232 -7464 -7659 *7825 | -7969 
“99 -5222 ‘5773 -6512 -7008 7373 | ‘7652 | -7875 -8061 *8212 *8346 | -8458 
| 
19 0-80 | 0-2515 | 0-3148 | 0-4073 | 0-4748 | 0-5273 | 0-5697 | 0-6047 | 0-6344 | 0-6597 | 06817 0-7010 
85, -2791 +3424 +4338 -4999 -5509 | -5919 -6257 -6541 | -6785 -6995 -7179 
90 | *3155 | +3782 ‘4677 | -53815| +5805) +6196 -6517 -6786 | -7016) +7214) +7387 
95 | -3719 | +4827) -5182) -5782) -6238| -6599 | -6894| -7139 7349 *7528 | -7684 
‘99 | -4823 | -5369) -6116| -6630) -7014 | 7313 | -7555 | +7756 | +7926 -8071 | -8198 
| | | | 7 
21 0-80 | 0-2300 | 0-2895 | 0-3782 | 0-4439 0-4959 | 0-5383 | 0-5738 | 0-6040 | 0-6301 _ 06529 0-6729 
85 | +2557 | -3155| -4034| -4682/| -5189 | +5602 | -5946| -6237| -6489  -6707 | -6900 
90 | *2897 | +3493 | 4358 | -4988 | -5479 | “5875 | -6204 | -6482 | -6721 | -6929 | -7112 
95| -3427| -4012) -4847/) -5445| -5906| -6277 | -6581, -6838| -7058 +7248 -7415 
‘99 679 aud sated 0285 "6685 | 6997 | “7256 | -7469 | -7652 | ‘7810 | -7946 
} | | | | 
This table gives the values of x for which Pr (6,,, <2) =1,(2; p,q)=P, where p=}(¥_— 1), g=4(¥1— 1). 





























246 Upper percentage points of the generalized Beta distribution. I 


Generalized Beta distribution (cont.) 























| ee ee Bees. wea 11 1315 17 1924 
\ | | | 
| Vy 4 | | 
Bs al ‘ ars 
| P | | | | 
23 0-80 | 0-2119 | 0-2680 | 0-3528 | 0-4167 | 0-4678 | 0-5101 | 0-5457 | 0-5763 | 0-6028 | 0-6262 | 0-6469 
| "85 | +2359 | +2924) -3769 ‘4401 | -4903 | -5315 +5661 | +5958 -6215| -6441 | -6640 
| 90 | +2677 | -3244| -4080| -4699| -5185| -5584| -5918| -6202 | 6448 | -6663 | -6853 
95 | -3177| -3737| -4551 5143 | -5606) -5981 | 6294 | -6558 | -6787| -6986| -7160 
99 -4179 | -4701 5443 | -5971 | -6380| -6703| -6970| -7197| -7391| -7558| -7705 
| | 
25 0-80 | 0-1964 | 0-2494 | 0-3306 | 0-3926 | 0-4428 | 0-4846 | 0-5202 | 0-5509 | 0-5777 | 0-6015 | 0-6226 
85 | -2189| -2725| -3536| +4151) -4645 | -5055)| -5402 | ‘5702 | -5963| -6193 | -6397 
| 90 | +2488 | -3027| -3834| -4439|) -4921| -5319| -5655) +5944 | -6194 | -6415| -6610 
‘95 | +2960 | +3498 | -4287| -4872| -5333 |) -5710 — -6027 | 6298 | -6533 | -6738 |) -6920 
‘99 -3915| -4424| -5155| -5685| -6096| -6429 6708 | 6941 | -7143| -7319| -7474 
| | | 
| 27 0-80 | 0-1830 | 0-2333 | 0-3110 | 0-3711 | 0-4202 | 0-4615 | 0-4968 | 0-5275 | 0-5546 | 0-5785 | 0-6000 
| 85 | +2042 | -2551| -3331| -3928| -4413) -4818/| -5165 5465 | -5728| -5962| -6170 
90} -2324| -2839| -3616| -4206) -4682| -5077/| -5413| -5704| -5958| -6183/ -6383 
‘95| -2771| -3286| -4052| -4626| -5084 -5462| -5781 | -6056| -6296| -6506| -6693 
| 99 | -3682| -4176| -4895| -5422| -5837| -6175 | “6458 | 6700 | +6909 | -7092 -7254 
| | | 
29 0-80 | 0-1713 | 0-2191 | 0-2936 | 0-3518 | 0-3998 | 0-4404 | 0-4754 | 0-5060 | 0-5331 | 0-5572 | 0-5789 
*85| -1913| -2398| -3147| -3727| -4202| -4603/ -4947| -5247| -5512| -5747| -5958 
-90 | -2180| -2672| -3420! -3996| -4463) -4855 | ‘5191 | -5482| -5738 | -5966| -6170 
‘95 | +2604} -3099| -3840| -4404 | "4856 | 5232 | -5553 | -5830 | +6074 | +6288 | -6480 
‘99 | +3475 | +3954) -4659| -5182 ened -5938 | -6225 “6472 | -6687 -6876 | -7043 
31 0-80 | 0-1611 | 0-2065 | 0-2780 | 0-3344 | 0-3812 | 0-4211 | 0-4557 | 0-4861 | 0-5131 | 0-5373 | 0-5591 
-85| -1800| -2262| -2982| -3546/ -4010| -4405| -4746| -5045| -5309| -5546| -5759 
90 | +2053 | +2523) -3246| -3805| -4264 -4651 | -4985 | poe -5534 | -5763| +5969 
‘95 | +2457| -2931) -3650! -4200| -4647| -5022) -5342| -5620! -5866| -6082| -6278 
99 | 3290 | +3754 | -4444) -4961 | +5374 | ‘5717 | -6008| -6258/ -6477| -6671| -6843 
33 0-80 | 0-1519 | 0-1953 | 0-2639 | 0-3186 | 0-3643 | 0-4034 | 0-4375 | 0-4677 | 0-4946 | 0-5187 | 0-5406 
‘85| +1699} -2141| -2834| -3381| -3835| -4223/ -4560| -4857| -5121| -5358| +5572 
‘90 -1940| -2389| -3087| -3632| -4081| -4464 -4794| -5084| -5342! -5572| -5781 
“95 | 2324 | 2781 | ‘3478 | -4015| -4455| -4825| -5145| -5424| -5671| -5890| -6088 
ia ‘3123 | +3573 | 4247 | 4757 | +5168 -5511| +5803 -6057| +6279 -6476 | +6653 
35 0:80 | 0-1438 | 0-1852 | 0-2512 | 0-3042 | 0-3487 | 0-3871 | 0-4208 | 0-4506 | 0-4773 | 0-5014 | 0-5283 
| 85 | -1609| 2032) -2699| -3230| -3674| -4055| -4388 -4682| -4945 | ‘5181 | -5396 
-90| -1838| -2270| -2943| -3473| -3913!| -4290| -4617| -4906| -5163| -5394| -5603 
‘95 | +2206) +2645 -3320| -3845| -4277  -4644, -4962| -5240| -5487 | -5708 | +5907 
| -99| -2972| -3408| -4066| -4569| -4974| -5318| -5612| -5867| -6091| -6292| -6471 
| | | 
37 0-80 | 0-1365 | 0-1762 | 0-2397 | 0-2910  0-3344 | 0-3721 | 0-4052 | 0-4347 | 0-4612 | 0-4852 | 0-5070 
85| -1528| -1933 -2577| -3092| -3526| -3900|) -4228| -4519| -4781| -5017) -5231 
-90| -1747| -2161/| -2812| -3328!| -3759! -4129| -4453 -4739  -4995 -5226| -5434 
95 2098 | -2521) +3175) +3689) -4113| -4475|) -4791! +5068) +5315 | -5536| -5737 
99 | +2834) -3257| -3900) -4394| -4797| -5138, -5430| -5688| -5915)| -6117) -6299 
| 
| 
39 0-80 | 0-1299 | 0-1679 | 0-2292 | 0-2790 | 0-3213 | 0-3582 | 0-3907 | 0-4198 | 0-4460 0-4699 | 0-4916 
‘85 | +1455 +1844) +2465 +2965) -3389 | +3756) -4079| -4367 | -4627  -4861 | -5076 
‘90 | +1664 -2062 -2691| -3194) -3616) -3980) -4299 -4583)| +4837 -5067) -5277 
‘95 -2001 +2408 -3044)| +3544 -3961 |) -4319 -4631|) -4906 -5152) 5375 +5576 
‘99 | -2709| -3119|) -3747| -4232 | ‘5747 | +5950 -6134 





-4631 | -4969 








+5260 “5517 | 




















This table gives the values of x for which Pr (0 


max, 


<x)=I1,(2;p,q)=P, where p=}(r, 


= 1), q=(Y,—- 1). 











21 




















F. G. Foster anp D. H. REees 247 
Generalized Beta distribution (cont.) 
wl | | | | 
\ a eS "es 9 I 13 | 15 17 19 21 
Vy = | | | 
\| | | | | 
a | | | | | 
P | | | 
| 41 0-80 | 0-1239 | 0- 1604 | 0-2195 | 0-2679 | 0-3091 | 0-3452 | 0-3772 | 0-4059 0- ine! 0-4555 | 0-4771 
-85| -1388| -1762 -2362| -2849/| -3262| -3622| -3940| -4225 | 4482 4715 | -4928 
‘90 | -1589 | -1972| -2582| +3070) -3483| -3840 4155 | 4436 | 4689 | +4918) | -5127 
‘95| +1912) -2306 | -2922/ -3411 | bed 4172 | +4480) +4754 -4999 | +5221) +5423 
99 | +2594 | 2093) -3605| -4079| -4472/ -4812| -5100 ‘5358 | +5585 -5792| -5977 
| | | | 
51 0-80 | 0-1006 | 0-1311 | 0-1814 | 0-2233 | 0-2598 | 0-2923 | 0: "3216 | 0-3482 (0 +3725 |o. 3950 | 0-4157 
‘85 -1129| -1443 -1955| -2380| -2748| -3073 -3366 | -3631| -3874| -4097| -4303 
| 90} -1295| -1619! -2142| -2571| -2941 | -3267| -3559| -3823| -4063| -4284| -4487 
‘95 | +1565 | -1900| -2434/ +2868 -3239| -3564| -3853 | -4114| -4350| -4566| -4765 
‘99 | +2140 | -2486| -3029| -3462  -3828| -4144| -4424 | haces -4900 | +5106 | +5295 
61 0-80 | 0-0846 | 0-1109 | 0-1545 | 0-1914 | 0-2240 | 0-2534 | 0-2801 | | 0-3047 | 0-3274 | 0-3485 | 0-3682 
*85| -0951  -1221 -1667| -2043| -2372| -2668 2936, ‘3182 | -3409|) -3620) -3816 
90} +1093) +1373) -1830| -2211| -2544| -2842| -3111| -3357| -3583| -3792| -3987 
‘95| +1324) -1615) -2086| -2474| -2810) -3109| -3378 -3622| +3847 -4054| -4246 
= *1821 | +2126) -2610| -3004 -3341 | -3637| +3903 | -4142)| -4361_ “4561 | -4743 
| | | | | 
| 71 0-80 | 0-0730 | 0-0960 | 0-1345 | 0-1675 | 0-1969 | 0-2236 | 0-2481 | 0-2708 | 0-2919 | 0-3117 | 9-3303 
85 | -0822| -1059| -1454| -1789| -2087| -2357| -2604| -2832| -3043| -3241| -3427 
90 | -0946| -1191 | -1597| -1939) -2242| -2514| -2762| +2991) -3203 | -3401| -3586 
-95| -1148| -1404/ -1824| -2175| -2481| -2756| -3006| -3236| -3447| -3644| -3828 
‘99 | +1584 "1855 | +2293 ‘2651 | -2963| -3239| -3488 | -3715| -3924| -4118| -4298 
| | | | 
81 sie 0-0643 | 0-0847 | o-1191 0:1489 0-1756 | 0-2000 | 0-2226 | 0-2437 | 0-2634 | 0-2819 | 0-2994 
85 | -0723 | 0934 | ‘1288 | -1592 | -1863| +2110] -2338| -2550| +2748) -2934/ -3109 
| ‘90 | -0833 | -1052| -1417| -1727) -2003| -2253| -2483| -2697| -2896| -3082/ -3257 
| 95 -1013| -1242) -1620| -1939|) -2221| -2475| -2707| -2922| -3122| -3308| -3483 
| 99-1402) +1646) -2044| -2374| -2662| -2919| -3153| -3368 3607 | 3751 | +3924 
| | | | 
91 0-80 | 0-0574 | 0-0757 | 0-1069 | 0-1340 wale | 0-1810 | 0-2019 | 0-2215 | 0-2399 | 0-2573 | 0-2738 
85-0646 | -0836| +1156} -1433| -1682| -1910| -2122| -2319| +2505) -2680) -2845) 
90 -0745| -0942|) -1273| -1556| -1810| -2042| +2255) -2455| -2642| -2818| -2984 
‘95! -0906| -1114| -1457| -1750) +2010) +2245) +2462) -2664| -2852) -3029 -3195 
‘99 +1257 | +1479) -1844| +2149) -2416| -2657| -2874| -3081| -3270| -3444/ -3608 
| | | | 
101 0-80 | 0-0518 | 0-0685 | 0-0969 | 0-1218 | 0-1444 0-1652 | 0-1847 | 0-2029 | 0- 2202 | 0-2366 0-2522 
$5 | 0584-0756 | 1049) +1304 +1533 1745) +1942) -2127 | 2301 | +2466 +2622 
90} -0673 | -0853 | +1155) +1416) -1651 +1866 | +2066 -2253 | -2428 | +2595 | +2752 
95) -0820) -1009| +1325) +1594 -1835 | +2055 | +2258 +2447) -2625/ +2792 -2951 
99 | +1140 | -1343| -1678| -1961 | -2211) +2436  -2644 +2836) -3015 | +3184) -3342 
(121 0-80 | 0-0434 | 0-0575 | 0-0817 | 0-1030 | 0-1225 | 0-1407 | 0-1577 | 0-1739 | 0-1892 | 0-2038  0-2178 
| 85} -0489 | -0635| -0885| -1104) -1303) -1487| +1660) -1824| -1978 | +2126 +2267 
‘90-0565 -0717| -0976| -1200| -1404| -1593| -1768| -1934| -2091 +2240 -2382 
95! -0688| -0849 | -1120| -1353| -1563) -1756| -1936| -2105| -2264|) +2415) -2559 
‘99 -0960 -1133| -1423| -1670| -1891 -2090| -2274| -2447| -2610| +2763) +2907 
| | 
161 0-80  0-0328 | 0-0435 | 0-0622 | 0-0788 | 0-0941 | 0-1085 | 0-1221 | | 0-1351 | | 0-1476 | 0-1596 | 0-1711 
‘85-0370 -0481 | -0674| -0844| -1001| -1148 | -1287| -1419| -1545| -1666| -1783 
90-0427) -0544) -0744| -0920| -1081  -1231|) -1372| -1507| +1635) -1758)| -1876 
‘95, -0521 0645) -0856) -1039  -1206  -1361 | -1506| -1644| -1775| +1900 -2020 
99 | -0864 -1288 | -1464| -1627 2056 +2185 | +2309 


-0730 





1092 | 








‘1778 | +1921 | 











This table gives the values of x for which Pr (9,,,, <2) = 


I,(2;p, q)=P, where p= 


3(v.—1), g=4(%-1 











[ 248 ] 


MISCELLANEA 


On problems in which a change in a parameter occurs at an unknown point 


By E. S. PAGE 
Department of Mathematics, University of Durham 


1. INTRODUCTION 


Given a sample of m independent observations in the order in which they were obtained, x,,...,7,; we 
consider the problem of testing the null hypothesis that all the observations come from the same popula- 
tion with distribution function F(x |@) against the alternatives that the first m (0<m<n), 2,...,x 

come from F(x | 0) and a4;,.--,%, come from F(z | 6’) (6 +0), where m is unknown. In an earlier paper 
(Page, 1955) a one-sided test for a change in the mean of a distribution was proposed; the procedure was 


r 
to record the cumulative sum S, = & (x;—@), and, if the mean path after the change had a greater slope 
j=1 


than that before the change, to use as a test statistic the rise in the cumulative sum above its least value, 


ie. max (S,— min S;), S, = 0, large values being significant. Some critical values were given for the case 
O<r<n 0<icr 


where the x, are 0 or 1 binomial variables. Here, a discrimination approach is adopted for the more 
general case and the procedure is shown to yield a modification of the previous test. 


2. ONE-SIDED CASE 


Suppose 0, 0’ known, and let the sample be x = (2,...,7,). If we regard the detection of a change as a 
problem in discrimination we have n + 1 hypotheses between which it is desired to discriminate; they are 
Hy, H,,...,H,, where H; is the hypothesis that the first i observations are drawn from F( | @) and the 
remainder from F(z | 0’), i.e. that the change occurs after the ith observation. A method of discrimination 
is specified by the definition of a division of the whole sample space into mutually exclusive regions 
R,; (i = 0,1,...,); the hypothesis H; is preferred if the sample point falls within R;. If the H; are a priori 
equally likely the minimum probability of misclassification is obtained when the regions R; (i = 0, 1,...,7) 
are defined by (Rao, 1948), xeR, if L(x)>L,x) (j+#%), (2:1) 


where L(x) is the likelihood of the hypothesis H;,. 
Example 1. Mean of a normal distribution 


Suppose that we wish to discriminate for a change in the mean from p to 4+ 6 (> 0) with the standard 
deviation, o, known and constant. Then (2-1) gives R; defined by 


1 i+k / 
ae ae {> le-a—89—(e,— mh} > 1 (k = 1, ooegN—1), 


1 a 
)-—— x [(x;— 4 — 8)? — (a;— p)?] <1 (k = 0, 1, soegM—t). 
20° j-i-k 


Hence R; is defined by = | 5)>0 (k= ) (2:2) 
_-- ¢- = 1,...5N—?2), é 

j=i+1 ° 

u é 
>» (,-n-$) <0 (eh = @,1.....8—1). (2-3) 

j=i-k : 


The discrimination ypoceetane can be more clearly described in terms of the cumulative sum of the 


(vc—p—}6). Let S, = g 1 fu—40), Sy = 0. Then the sample point x=(aj,...,2,) lies within R, if 


S,<S; for all 7. For (2-2) is Sisz—S,20 (k= 1.,...,n—3), ) 


and (2-3) is S,—S,-»-,<0 (k=0,1,...,4—1).) 


Hence S, is the least of the cumulative sums. Thus the procedure may be conveniently carried out by 


recording the cumulative sum on a chart and selecting the hypothesis corresponding to the minimum 
of the sum. 


(2-4) 


thro 
initi 
R,, f 


and 


For 
bet 


Ace 
curr 
hyp 
this 


The 
ae 
prie 
at | 
dist 
we | 
to | 
to 1 
req 
an; 
it in 
the 
tha 


ine 
giv 


; we 
ula- 
opie 
aper 
was 
lope 
ilue, 
case 


nore 


as a 
7 are 
| the 
tion 
ions 
riori 
os) 


(2-1) 


lard 


2-2) 


(2:3) 
the 
R, if 
(2-4) 


t by 
num 


Miscellanea 249 


Example 2. Distributions with a sufficient statistic 
Consider distributions with frequency functions of Koopman’s form 
f(x | 0) = exp {A(4) B(x) + C(8) + D(zx)}. 
Then S’(x)/f(x) = exp {B(x) AA(A) + AC(A)}, 
where AG(0@) = G(0’) — G(@). Accordingly the regions R,; are given by 


itk 

AA(O) x B(x;)+kAC(@)>0 (k=1,...,n—%), 
ore (2-5) 

AA(O) & Bia; +(k+ 1) ACG\O)<O (k =0,...,4). 

j=i-1 
) ACO) 

= > . meena 9-6 
If AA(@) > 0, the cumulative sum S; Pat {Bees + aaa (2-6) 


may be recorded and its minimum used to select the hypothesis preferred. If AA(@) <0, the maximum of 
the same cumulative sum is used in the discrimination procedure. 

Thus for a normal population whose mean remains constant at “ but whose variance may change 
from o? to o2, the cumulative sum to be recorded is 


i lL ae 
S,= D {(a,—p)?+2] —-— ] log—}. 
j=1 (« oe E A a 
In some situations it may be a priori more probable that the initial value of the parameter will persist 
throughout the sampling than that a change should occur. If the hypothesis H,,, that no change from the 
initial value has «ccurred, is a priori c times as likely as any of the other hypotheses, then the regions 
R;, for the discrimination procedure are defined by 
xeR, if L(x)>L(x) (J+, i,j+n), 
2cL,(X) (tn), (2-7) 
and xeR, if cL,(x)>D,(x) (tn). 


For distributions with a sufficient statistic the above conditions amend (2-5) only for the comparison 
between H, and H,,. H, is preferred to H;, if 


AA(0) >> B(x;) + (n—i) AC(A) > loge. (2-8) 
j=i4+1 


Accordingly the hypothesis to be preferred is that indicated by the minimum (for AA(@)>0) of the 
cumulative sum, S;, of (2-6) if the rise of S,, above its least value exceeds logc/AA(@); otherwise the 
hypothesis that no change has occurred is adopted. For the test on a normal mean with unit variance 
this rise in its minimum is 6“ loge. 


3. A ONE-SIDED TEST 


The procedure of § 2 is one that can reasonably be used as a test of the hypothesis that the observations 
2%, +++, 0, are all drawn from the same population, F(x | 0). If F(#| 0) is of Koopman’s form the appro- 
priate cumulative sum is plotted and the null hypothesis rejected if the final point of the sample path is 
at least a given distance above its minimum (AA > 0) (Fig. 1). We note that in this test the criterion is the 
distance of the end-point of the path above the minimum value, whereas in the earlier test (Page, 1955) 
we considered the greatest distance of the path above the minimum. The properties of this test are likely 
to be difficult to calculate as it is equivalent to a truncated sequential test. However, an approximation 
to the power function which is valid for a large sample and when a large rise in the cumulative sum is 
required for the rejection of H, may be obtained from the normal diffusion process in the presence of 
an absorbing barrier. Since the test is a rise in the cumulative sum at the end of the path, we may express 
it in terms of the end-point as a fall below that value. Thus if we look at the sample in the reverse order, 
the criterion is whether or not the cumulative sum crosses a fixed horizontal boundary within n steps; 
that is, the problem is that of a truncated random walk in the presence of an absorbing barrier. 

A very simple approximation to the probability that the boundary is crossed within n steps when the 
increments, y;, have some symmetrical distribution about zero mean and standard deviation o, has been 
given by Armitage (1957). Let p; be the probability that the cumulative sum Ly; exceeds a constant 








250 Miscellanea 


k for the first time after the ith observation. Then the probability that the final sum exceeds k given that 
the sum first did so after the ith observation is slightly greater than }p;. Hence by summation, 


. 151 
Pr{S,>=kh}>= X pit+pn 
2-1 











ey 5 
>= D; 
Perk 
= } Pr(boundary is crossed within n steps), 
Sr 
x 
x 
x x 
6Fr « — 
KK yx sa 
4b » 
x 
x 
x 9:99 
5 - ~  « x 
v x * 
2 xxx 25. xx No. of observations 
3 ee — + _— + + + 
- 0 5 10 15 2 =< x 30 35 40 45 50 
3 « « 
UO x 
—2- x 
x x 
x 
—4F x 
a x ' 
Oe DE cern cece ecw erence mnenceseneecan --- 
- 6 





Fig. 1. One-sided test. Sampling experiment on random normal deviates; for n = 1,..., 20,4 = 0, 
o = 1; forn = 21,...,50, w = 0-5, o = 1. Cumulative sum plotted is &(#— 0-25). 


where the symbol > means ‘slightly greater-than’. Hence 


Pr (boundary is crossed) = 2 [ o(—)] (3-1) 


o V n 


t 
using the central limit theorem, where P(t) = (emf exp (— 4a?) dz. 

A more general approximation for increments with non-zero mean, which we need here, is obtained 
as in Cox’s (unpublished) treatment of Armitage’s problem, by replacing the random walk in discrete 
time by one in continuous time. For such a process in which the increment per unit time has mean m 
and variance o*, the probability density of the cumulative sum at time ¢ is 


: 1 f(y \8 2bm i /y—2b ad 
fly, b, t» = (270*t)-4 [ exe 32 (3,- mM ) —exp—> exp ———. (> —m vt) ] 2 (3-2) 
where the boundary is at y = b (b>0) (e.g. Bartlett, 1955, p. 48). The probability that the boundary 
has not been crossed in the interval (O, 7') is therefore 





b b m/T b m/T 
g(6, T) = y,b, T) dy = ® | —— — ——_ } — e2mbio* Qt — — —— |" 
(DP) =} Sy. Tdy (= - ) e ( oi? o ) (3:3) 
Hence given m and o? the mean and variance of the increment, y, and 7’ = n, the number of observations, 
we can apply this result directly to our test to find the position, b, of the boundary so that the hypothesis 
that all the x’s come from the specified distribution will be rejected when it is true with probability approxi- 





an 


that 


(3-1) 


tained 
iscrete 
1ean 


(3:2) 


indary 


(3:3) 


ations, 
othesis 
pproxi- 





Miscellanea 251 
mately any given value. A few values are shown in Table 1 for a probability 0-05 of Type I errors. For 
example, to test whether in fifty observations a change from yp to 4 + 0-20 occurs in the mean of a normal 
population, the sum X(x;——0-1c¢) is plotted in the presence of a boundary at 10-0c. 

Table 1. Approximate values of b/o 


| 








| n 25 50 100 
m/o | 
— 0-025 9-3 12-8 17-6 | 
—0-05 8-8 11-8 15-6 | 
—0-10 7-8 10-0 12-2 | 
—0-20 6-1 6-9 1-4 

















The diffusion approximation is of the same kind as the Wald approximation for the characteristics 
of sequential tests and, as the latter are usually sufficient for practical purposes, it is reasonable to sup- 
pose that in this case the same holds. By an enumeration of paths to the boundary, it is possible to 
evaluate exactly the probabilities for the binomial case and these agree fairly well with those given by 
the approximate formulae (Armitage, 1957). 


4. TWO-SIDED CASE 


Let us now consider the case where the alternatives to the hypothesis that the observations all come 
from a distribution F(a | 0) are that the first 7 observations come from F(x |@) and the remainder all 
come from either F(x | 0’) or F(a | 6”), where 0’ > 60> 6”, i.e. the change in the parameter may be in either 
direction. We suppose 0, 0’, 0” known. Then in the discrimination problem we have 2n + 1 completely 
specified hypotheses H,, H}, H> (t = 0,1,...,n—1), where H} (H;) are the hypotheses that the first 
i observations come from F(x | @) and the remainder from F(z | 0’), (F(x | 6”)); the superfix +, — thus 
indicates the direction of the change in @. As before, we divide the whole sample space into (2n +1) 
mutually exclusive regions R,, R}, R> (¢ = 0,...,2—1) such that the region in which the sample point 
falls indicates the hypothesis preferred. When the hypotheses are a priori equally likely the probability 
of misclassification (or the expected loss if the weighted losses are equal) is minimized if the regions are 
defined by: 


xeR} if L}(x)>max(L}(x), L(x), L,(x)) (0<j; k<n—1;7+7;7=0,1,...,n—1). (4:1) 
xeR, if L,(x)>max(L}(x), L7(x)) (0<i<n-l), 


with a similar definition for R>, where L}(x), L; (x), L,(xX) are the likelihoods of the corresponding 
hypotheses. A comparison of the H+’s (H~’s) with H,, leads, of course, to the one-sided criterion for 
discrimination. If H,, is preferred under both the one-sided procedures, it is preferred under the two- 
sided procedure; if H,, is preferred by one, but a different hypothesis, say H¥, by the other, then H* is 
preferred by the two-sided procedure. If each one-sided procedure prefers some hypothesis other than 
H,, it is necessary to use the additional conditions implied in (3-1) to choose which of the two is preferred 
by the two-sided procedure. . 
Example 3. Mean of a normal distribution 


Suppose that the three possible means are 0, + 4, and that the variance is unity. Then H,, i.e. that no 


change has occurred, is preferred if A 


max x (a,—4y) <0, 
0<i<n—1 k=i+1 


. (4-2) 
and min 2% (a,+44)> o| 
0<j<n-1 k=j+1 


If just one of these inequalities is unsatisfied, suppose it is the first and that the maximum occurs for 
i=l. Then H}> is preferred. If both inequalities are unsatisfied, the maximum and minimum occurring 
ati = 1,7 = m, then H;} is preferred to H;, if 


n n 
xX (%—-t)>-— LD (x +3), (4:3) 
k=1+1 k=m+1 
and conversely. 








252 Miscellanea 


For a graphical representation it is convenient to consider the graph of the cumulative sum of the 


j 
observations themselves, S; = & 2; against 7. The conditions (4:2) and (4-3) are then expressible as 
i=1 
whether or not the lines y—S,, = + $4(«—7n) contain the path and, if the path intersects both lines, which 
of the intersections is the deeper. The lines concerned are those drawn through the end point of the path, 
with slopes + $y. 
More generally, for distributions of Koopman’s form with « sufficient statistic and for which A(@) 


v 
is monotonic, the cumulative sum & B(x;) can be plotted and lines with slopes 
j=1 


8’ = {C(0’) —C(A)}/{A(O’) — A(9)},_ 8” = {C(8") —C(A’)} /{A(0") — A(O")} 


drawn through the end point. It may be noted that when the procedure is applied as it stands to a sequence 
of 0, 1 values from a binomial population then H,, the hypothesis that no change has occurred, is never 
preferred; if the last observation is ‘1’, one of the H+ hypotheses is preferred to H,,, and similarly if x, = 0. 
This disadvantage does not necessarily persist if the observations are grouped. 

If the hypothesis of no change, H,, is a priori c times as likely as any of the alternative hypotheses 
the procedure gives rise to two sets of inequalities like (2-11) for the comparisons between H,, and the 
H*’s or the H~-’s; in the case of Koopman’s distributions these yield boundaries with slopes s’, s”, which 
pass through the points (n, S,, + d(, 0”)), (n,S,+d(9, 0’)), where d(0, 0*) is a function of 0, 0* determined 
by the distribution of x. For comparisons of one of the H*’s with an H-, the condition is the same as 
when c = 1, i.e. it depends upon the deeper of the intersections with the above boundaries with d=0 
(if d(0, 6”) = —d(0, 0’) this is of course equivalent to preferring the hypothesis indicated by the deeper 
intersection of the boundaries as drawn). An approximation to the positions of the boundaries if this 
procedure is to be used as a test can be obtained from the one-sided test and its diffusion process approxi- 
mation. Armitage (1957) has shown that the error in this approximation is small enough to be neglected 
in most practical cases. 


5. CONCLUSION AND SUMMARY 


A one-sided test for a change in a parameter within a set of observations in the order drawn is given, 
and an approximation to the boundary is derived. The test is shown to arise from a discrimination pro- 
cedure and the corresponding two-sided test is quoted. These tests are similar to the ‘restricted sequential 
procedures’ proposed by Armitage in another context, where divergent boundaries and an upper limit 
to the sample size are imposed at the outset. Here our sample size is fixed and the boundaries are drawn 
at the end of the sampling. 

It is interesting also to compare the procedures given here with those derived by Rao (1950) for 
sequential tests of null hypotheses when nothing is specified about the alternative hypotheses. Rao’s 
approach used the locally most powerful best test to find a criterion which, if satisfied in less than a 
fixed number of observations, would lead to the rejection of the null hypothesis. Thus for a one-sided 
test of the mean, #, of a normal population of known standard deviation, Rao’s test is to reject the null 


n 
hypothesis if & (a;—)>k for some i (0<i<N), which is similar to our test when it is looked at in the 
i=1 
reverse direction, although the cumulative sum plotted is different. 

Rao’s test for distributions of Koopman’s form considers the cumulative sums of B(x) A’(@) + C’(A), 
instead of using the differences as in (2-9), and will, of course, have a criterion different from (2-9) in 
general. For example, Rao’s test for the standard deviation of a normal population of known mean 
plots a cumulative sum which has a mean path of slope zero on the null hypothesis while the cumulative 
sum of our procedure has mean path with slope different from zero on all the hypotheses. 


I wish to thank Dr P. Armitage for reading this note in draft and for allowing me to see his paper on 
Restricted Sequential Procedures before publication. 


REFERENCES 


ArmiTaGE, P. (1957). Biometrika, 44, 9. 

Barttett, M. 8. (1955). Stochastic Processes. Cambridge University Press. 
LINDLEY, D. V. (1953). J. R. Statist. Soc. B, 15, 30. 

Pacer, E. S. (1955). Biometrika, 42, 523. 

Rao, C. R. (1948). J. R. Statist. Soc. B, 10, 159. 

Rao, C. R. (1950). Sankyha, 10, 361. 





a 


a 


’ the 
le as 


hich 
ath, 


A(6) 


ence 
lever 
=e 


neses 
1 the 
rhich 


‘iven, 
1 pro- 
ential 
limit 
rawn 


)) for 
Rao’s 
han a 
sided 
e null 


in the 


C’(), 
9) in 
mean 
lative 


er ON 





Miscellanea 253 


Testing for departure from the exponential distribution* 


By D. J. BARTHOLOMEW 
Scientific Department, National Coal Board, London 


1. INTRODUCTION 


Occasions arise in statistical practice when it is necessary to test whether a random variable, ¢, has the 
exponential distribution p(t) =Ae-rt_ (t>0), 


on the basis of an obs.ved sample containing » independent observations, say, t,,¢,,...,¢,. Examples 
are to be found in lite testing and the study of the distribution of intervals between events occurring in 
time or space (see, for example, Maguire, Pearson & Wynn, 1952). The object of this paper is to compare 
the power of three tests of whether a sample is exponentially distributed. The statistics considered are: 


n 
M= -2| log,t—nlogi|, 
i=1 


= 


i=1 
n 
ae bi 


These criteria have been discussed in the work of a number of writers, in particular by Darling (1953), 
who gave a unified treatment of the distribution theory. 

The power function of a test can only be found by reference to some alternative to the null hypothesis. 
Which alternative is appropriate in a given circumstance depends entirely on the particular nature of 
the problem in hand. In this paper, therefore, our aim will be the general one of trying to answer such a 
question as, ‘Under what kind of alternatives is S more (or iess) powerful than w?’ Such information 
should be of value in deciding which test to use. 

Four alternatives will be considered and the relative power of the three tests determined with respect 
toeach of them. It has not been possible to obtain the exact power functions or even good approximations 
to them in the majority of cases. Instead, use has been made of the well-known result concerning the 
asymptotic relative efficiency of tests. An instructive discussion of this method has been given by Cox 
& Stuart (1955). In one case (see § 4 below) where a fair approximation to the power of both M and S 
has been obtained, the asymptotic method is shown to be satisfactory for sample sizes of the order of 100. 


where t= 


2. ALTERNATIVES TO THE EXPONENTIAL 


The four alternatives with which we shall be concerned have been selected because they show the strong 
and weak points of the tests. For a full discussion of the ways in which these and other alternatives can 
arise reference may be made to Bartholomew (1955). 


Alternative I p(t) = A{T(a)}-1 (At)*—“2e-”A"_ (42> 0, a> 0). 


Moran (1951) suggested this alternative and showed that M was the asymptotically most powerful 
test of the hypothesis a = 1. The curve has a zero ordinate at ¢ = 0 for a>1 and an infinite ordinate if 
0<a<l. 


Alternative II p(t) = A(At)*“2exp{—(At)*/a} (t>0, a>0). 

This distribution is usually associated with the name of Weibull. It is similar in shape to I above for 
a near to unity, having initial values equal to infinity and zero for 0<a<1 and a> 1 respectively. 
Alternative III p(t) _ A(1 +aAt)—a+via (t>0, a>0). 

In contrast to the preceding curves, this alternative is always J-shaped with a finite ordinate at ¢ = 0. 
Maguire et al. gave it as an alternative to the exponential. It is a special case of a class considered by 


* The work contained in this paper is based on part of a thesis approved for the Ph.D. degree in the 
University of London. 








254 Miscellanea 


Neyman (1941) in connexion with testing for homogeneity of variance. He arrived at the criterion § 
as being appropriate for testing the hypothesis a = 0. In the Pearson system this is a Type XI curve. 


Alternative IV p(t) = AeA*{1 + a( JA? —2AL+ 1} (20). 
This curve is similar in shape to III. It is a special case of the Laguerre series which is obtained 
from the Pearson Type III distribution in the same way as the Gram-Charlier curves are obtained from 


the normal distribution. Cox & Smith (1952) gave it as a limiting form of the distribution of intervals 
between the events in a ‘pooled output’. 


3. POWER OF THE TESTS 


The analysis of this section is based on the following result quoted from Cox & Stuart (1955). 

If there are two consistent test statistics ¢, and $,, of a hypothesis Hy: 0 = 0, the asymptotic relative 
efficiency (A.R.E.) is the reciprocal of the ratio of sample sizes required to attain the same power against 
the same alternative H,, taking the limit as the sample sizes tend to infinity and as H, tends to Hg. 

If ¢, and ¢, both have normal limiting distributions on H, and H,, the a.R.E. of ¢, compared to 


d, is given by R%™g,)\¥" 
ma) 





A.R.E. (91, $2) = lim ( 


F,) | 2 
where R$,) = ; - (| 0)| var (p;| 0 = 9%), 
00 \0=0, 
provided that r satisfies the equations 


lim R(¢,)n-"=R,; (t= 1,2). 
no 
The R; are constants independent of n which is the sample size. In all cases considered in this paper 
ry = 1. For brevity we shall write 


B= ~ a6)! , V=var(¢|0= 4). 
oO \0=0, 
It should be noted that it is only necessary to know the limiting values of E and V in order to calculate 
the a.r.£. This fact will enable us to find lim R? when the mean value of the criterion under the alter- 
native cannot be found exactly. 

As to the condition of limiting normality it is known that each of the three statistics satisfy the 
condition on the null hypothesis. It is known further from a theorem of Moran (1947) that S satisfies 
it under any alternative. No such general result appears to be available for M and @ although in the 
special case of the Type III alternative M is normally distributed in the limit. It seems very likely that 
M and @ are asymptotically distributed in the normal form in the limit but even if this is not so the 
A.R.E. still provides a useful local comparison of power. The following results must be interpreted in the 
light of this remark. 

We now proceed to calculate 2 for M, S and w under the four alternatives listed above. For alter- 
natives I and II the null hypothesis corresponds to a = 1 and for III and IV toa = 0. 


I. (a) When the observations are distributed in the type III form it can be readily shown that 


é(M) = — 2nlogn—ny(a) + ny(na)}, 


where (x) = £ tog, T(z). 


Differentiating we find 
0é(M) 
da a=1 





= -— 2Any1) —n*yn)}, 


which, taking the limit as n tends to infinity, gives 
lim E = — 2n(y1) — 1). (1) 


We write ‘lim’ for lim throughout this section. 
n—->o 
(b) The expected value of S under this alternative was shown by Moran (1951) to be (a+ 1)/(an + 1). 


This leads immediately to the result pat ON (2) 





wl 


an 


an 


al 


oo oh 


‘ion S 
irve. 


ained 
from 
rvals 


lative 
yainst 
H,. 

ed to 


paper 


‘ulate 
alter- 


y the 
tisfies 
in the 
y that 
io the 
in the 


alter- 


(1) 


1+ 1). 
(2) 





Miscellanea 255 


(c) The exact value of &(a | a) has not been obtained but we have 
lim &(w|a) = 3(Type III mean deviation)/(Type III mean). 
Straightforward calculation shows this to be 


I,(a)—I,(at+ 1), 


where I,(a) | t2-1 e-tdt/T(a). 


0 
Differentiating and simplifying, we ultimately find 
lim # = y/e+1—2/e—0-6321, (3) 
where y = Euler’s constant = 0-5772.... 
II. (a) For the Weibull distribution we find 
lim &(M) = 2nflog, [(1+ 1/a) —y(1)/a} 


and lim EB = 2n(y(1) — y(2)) = — 2n. (4) 
(b) lim &(S) = T'(1+ 2/a)/n(T(1+ 1/a))? 
and lim B = 4(y(2) —(3))/n = n-. (5) 


(c) The limiting value of the mean of w takes the rather complicated form 


ioe) ies r r+lja 
lim é(w) = (: —e-e+ > ( 1)" 4 —— ) [aor 
r=0 











r! r+l+lja 
where e = (I(1+1/a))*; 
thus lim E = — y(2)(1—2/e)+ > ss 
: ais 5 mr so rt ray 
Summing the series on the right numerically we reach the result 
lim EZ = — {y(2) (1— 2/e) + 0-1645}. (6) 


Ill. (a) Under the third alternative 
lim 6(M) = 2n{a(1+ (2) —(3)) — W()}, 


7 


and therefore lim H = 2n(1+y(2)—y(3)) = n. (7) 
(b) lim &(S) = 2(1—a)/(1— 2a), 
lim E = 2. (8) 
(c) lim &(@) = (1—a)"*, 
lim E = fe. (9) 


IV. (a) Under the fourth alternative we find 
lim &(M) = —y(1)—}a, 


lim £ = —}. (10) 
(b) lim &(S) = (1+ 2a)/n, 

lim # = 2. (11) 
(c) lim &(m) = (1—}4a)/e, 

lim E = — }e. (12) 


The foregoing results are brought together in Table 1. For each alternative the criteria are compared 
with the best test to give the a.R.£. The roman numerals at the head of the columns refer to the numbering 
of the alternatives in the text. 


4. THE RELIABILITY OF THE A.R.E. 
The results of § 3 are asymptotic properties of the tests. The object of this section is to see how far these 
limiting properties hold good in finite samples for one special case. Approximations to the power of 
both M and S can be obtained under the Type III alternative by fitting curves having the same moments. 
The calculation of £, and f, for M under this alternative suggests using an approximation of the form 


M=C}?, 








256 Miscellanea 


where C and v are determined from the first two moments of M. Investigations into the form of the 
distribution of S, in the null case (Bartholomew, 1955), suggest that it might be fairly well approximated 
by a lognormal distribution. 

According to the figures in Table 1 we should expect the power of S using 100 observations to be almost 
the same as that of M on 39 observations. The power functions obtained in the way described above are 
compared in Table 2. In view of the approximate nature of the comparison the agreement is good; this 
suggests that the asymptotic properties may be used with some confidence for sample sizes of this order. 


Table 1. Values of the asymptotic relative efficiency of M, S and w under four alternatives 




















ee Alternative law | 
Test Limiting | 
statletlc variance on | 
ll h hesi | 
|, ee I | I III IV | 
| | 
| | xb | | 
M | 4n(y1)—1) 1-00 1-00 0-38 | 0-38 
S | 4n- 0-39 0-64 1-00 1-00 
7 | 0:0591n- | 0-63 0-83 0-57 | 0-57 





Table 2. Comparison of the power of M and S under the Type III alternative 





a 1-0 


| 

| 

| 

M, (39 obs.) | 0-050 | 0-142 | 0-326 | 0-591 0-840 | 0-969 0-998 | 
S, (100 obs.) | 0-050 | 0-140 | 0-316 | 0-560 0-794 | 0-939 0-990 | 
| | | | | | 


| 
0-9 | O08 0-7 0-6 0-5 0-4 | 
| 
oth 





| 
| 
| 


| 
| 








5. CONCLUDING REMARKS 


Table 1 shows that the form of the alternative has a pronounced effect on the relative power of the three 
tests considered. On alternatives I and II, M is clearly the best test to use while the very reverse is true 
for III and IV. The reason for this can be appreciated by noting that it is the relatively small intervals 
that are most important in determining whether M is significant. We should therefore expect M to be 
the most powerful under alternatives showing a marked departure from the exponential near ¢ = 0. 
This is in fact the case for I and II which take either zero or infinite values at this point. On the other 
hand, the value of S, which is a sum of squares, is influenced most by the largest values and so would be 
sensitive to departure from the exponential at the upper tail. The o test takes up an intermediate posi- 
tion in both cases, giving less weight to the small values than does M and more than S, and vice versa 
for the large values. 

In some instances it may be possible to decide, a priori, on an alternative distribution, but it is doubtful 
whether this will be so in general. Under the latter circumstance @ could be used on the grounds that it 
will always be second best, whereas either of M or S could be the worst. In this way we minimize the 
maximum loss in power which might be incurred by a wrong choice of test. 

Even if it is known that M provides the best test, it can only be used in practice if the small values of 
t are recorded to high accuracy. This is not often the case, in fact very small values are frequently re- 
corded as zero, thus making M infinite. Bearing in mind that the significance of M can depend to a large 
extent on inaccuracies in recording, it would be safer to use @ when there is any doubt. 


My thanks are due to Dr N. L. Johnson, for his advice throughout this work, and to the Department 
of Scientific and Industrial Research for the award of a maintenance grant during the period when the 
research was carried out. 





fF the 
rated 


most 
re are 
; this 
rder. 





Miscellanea 257 


REFERENCES 


BARTHOLOMEW, D. J. (1955). Ph.D. Thesis, University of London. 

Cox, D. R. & Smrrx, W. L. (1952). Biometrika, 40, 1. 

Cox, D. R. & Stuart, A. (1955). Biometrika, 42, 80. 

Daring, D. A. (1953). Ann. Math. Statist. 24, 239. 

MacurreE, B. A., Pearson, E. 8. & Wynn, A. H. A. (1952). Biometrika, 39, 168. 
Moran, P. A. P. (1947). J. R. Statist. Soc. Suppl. 9, 92. 

Moray, P. A. P. (1951). J. R. Statist. Soc. B, 13, 147. 

NeyMan, J. (1941). Ann. Math. Statist. 12, 46. 


The distribution of range in normal samples with n = 200 


By B. I. HARLEY anp E. 8. PEARSON 
University College, London 


1. INTRODUCTORY 


In his paper of 1925, Tippett considered the distribution of the range, w, in samples from a normal 
population, chiefiy from the point of view of the moments, over the whole stretch from samples of n = 2 
ton = 1000. He tabled the expected value of the rauge (in terms of the population standard deviation, 
o, as unit) to five decimal places for n = 2(1) 1000, but only gave values of the standard deviation and 
beta coefficients for n = 2, 10, 20, 60, 100, 200, 500, 1000. His standard deviations were given to three 
decimal places, but he considered that little reliance could be placed on the third decimal place in the 
f, and f, values. After 1925, attention was focused on the distribution of range in small samples, largely 
because of the usefulness of this statistic as a measure of variability in industrial quality control. Thus 
Pearson & Hartley’s (1942) tables gave the probability integral of w to four decimal places form = 2 (1) 20, 
by intervals of 0-05 in w. 

Recently, however, suggestions have been made* that a comparison of the range and root mean-square 
estimators of the population o may serve a useful purpose in quite large samples, either as a test of homo- 
geneity or as a routine check on accuracy in computation. Tukey (1955) discussed certain methods of 
interpolation which he showed could be used with some success to calculate such quantities as the 
percentage psints of w or of w/st within the very open grid of large sample information which has been 
searcely extended since Tippett’s original calculations.{ He pointed out, however, that satisfactory 
results could hardly be established finally untii one or more highly accurate computations had been 
carried out between, say, n = 100 and 1000. The tables and derived values given below for the single 
case n = 200 are presented as a contribution towards this objective. 


2. DESCRIPTION OF RESULTS 


Table 1 gives the probability integral of the range, w, to seven decimal places at argument intervals of 
0:25. Each value was computed independently by quadrature as described in §3 below. The last figure 
may be in error by one or two units in the middle of the table, but it was decided to put the seven-figure 
results on record, rather than to cut down to six decimals. Table 2 giving the probability integral to 
four decimal places at the more convenient argument interval of 0-05, was derived by interpolation from 
the basic figures of Table 1. From the latter it was also possible to calculate the moments of the dis- 
tribution of w by quadrature, using the formula 


pilw) = (—wo) +4 I Ww) qwyaw, (1) 


* See for example, David, Hartley & Pearson (1954). 

+ Here s is the usual standard deviation estimate of o calculated from the same sample as w. 

t Additional values of the second moment of w at n = 30,45 and 75 were computed for one of us 
(Pearson, 1932). With Ruben’s (1954, p. 224) highly »:.:*>te values of the mean range now available 


for n = 30, 45, it is possible to improve on the figures in 1932, as follows: 
Sample size 30 45 75 
Variance of w 0-479 783 0-435 684 0-389 01 
Standard deviation 0-692 66 0-660 06 0-623 7 


17 Biom. 44 








258 Miscellanea 


where wy, is a convenient working origin near the mean, taken in this case as 5-5, and P(W) = 1—9(W) 
is the probability integral. As usual in carrying out a quadrature of this kind, when we wish to stretch 
available information to the limit, it is a little difficult to be sure of the exact degree of accuracy of the 
results obtained. As a check, we computed moments by applying quadrature to equation (1) in three 
ways: using (a) the seven decimal and (b) rounded off six decimal place values of Q(W) at intervals of 
0-25; (c) using only the alternate seven decimal place values, i.e. values at an argument interval of 0-50. 


Table 1. Basic values of the probability integral 








WwW P(W) W P(W) W P(W) 
| 
6-00 0-822 4992 9-00 | 0-999 9962 
3°25 | 0-000 0000 6°25 -902 9409 9-25 -999 9988 
3°50 -000 0003 6-50 -950 5744 9-50 | -999 9996 
3°75 -000 0181 6°75 -976 3683 9-75 1-000 0000 
4-00 0-000 3917 7-00 | 0-989 3214 
4-25 | -004 0133 7-25 | 995 4158 
4-50 -022 5924 7-50 -998 1226 
4-75 | 079 3777 7°75 -999 2642 
5-00 0-193 6026 8-00 0-999 7233 
5:25 *357 9398 8-25 -999 8999 
5-50 | -539 0411 8-50 -999 9652 
5-75 | -700 6038 8-75 -999 9883 | 
| | | 











W P(W) W P(W) W P(W) Ww P(W) Ww P(W) 
3-75 | 00000 | 4:75 | 00794 | 5-75 | 0-7006 | 675 | 0-9764 | 7-75 | 0-9993 
3-80 0000 | 4:80 0974 | 5-80 7284 | 6-80 -9798 | 7-80 -9994 
3-85 0001 | 4:85 ‘1178 | 5-85 1546 | 6-85 9827 | 7:85 -9995 
3-90 0001 | 4-90 1407 | 5-90 ‘7790 | 6-90 | -9852 | 7-90 -9996 
3-95 | -0002 | 4-95 1660 | 5:95 | -8016 | 6-95 9874 | 7-95 -9997 


| 4-00 0-0004 5-00 0-1936 6-00 | 0-8225 7:00 | 0-9893 8-00 0-9997 
0 














4:05 | -0007 5-05 +2233 6-05 | -8417 7:05 | 9909 8-05 9998 
410 | +0011 5-10 *2548 6-10 *8593 7-10 -9923 8-10 | -9998 
4-15 ‘0017 5-15 -2880 6°15 +8753 7-15 -9935 8-15 | -9998 
4:20 | -0027 5-20 | +3225 6-20 *8898 7-20 9946 8-20 9999 
4-25 | 0-0040 5-25 0-3579 6-25 0-9029 7:25 | 0-9954 8-25 | 0-9999 
4-30 -0059 5-30 +3941 6-30 *9147 7-30 | 9962 8-30 | 9999 
4-35 -0085 5°35 -4306 6-35 *9253 7°35 9968 8-35 9999 | 
4-40 ‘0121 5-40 4671 6-40 *9347 7-40 | 9973 8-40 9999 | 
4-45 | -0167 5°45 -5034 6°45 *9431 745 | 9977 8-45 1-0000 

| 
4-50 | 0-0226 5-50 0-5390 6-50 0-9506 7:50 | 0-998] | 
455 | -0301 5-55 +5739 6-55 +9572 7-55 | -9984 
460 | -0393 5-60 -6076 6-60 | -9629 7:60 | = +9987 
465 | +0505 5-65 -6401 6-65 | -9680 765 | -9989 
4-70 | -0638 5-70 6711 6-70 “9725 7-70 | 9991 

















0) 

















Miscellanea 259 


On the basis of these evaluations we believe that the following figures should not be in error by more than 
one or two units in the last place: 


Mean w = 5-4920853, o,, = 0-565992, jm, = 0-320 347, 
Hs = 0-090 935, jw, = 0°35316, f, = 025153, f, = 3-4414. 
From Table 1 we have obtained by backward interpolation the percentage points for w shown in Table 3. 
It was found that if the tables of standardized 0-5, 1-0, 2-5 and 5 % points of Pearson curves (Pearson & 
Hartley, 1954, Table 42) are used with the moments given above, we obtain almost complete agreement* 


to two decimal places with the figures given in Table 3. This means that to the accuracy considered, the 
distribution of w must be very closely representable by a Type VI or inverted beta curve. 


Table 3. Percentage points of range in samples of n = 200 





Percentage | Ol | 065 | 10 | 25 5-0 100 


Lower points 4-094 | 4-281 4-372 4-518 4-648 4-806 
Upper points 7-670 7-225 7-020 | 6-731 6-496 | 6-238 





Finally it seemed worth examining whether the values of the percentage points for w/s given by 
David et al. (1954, p. 491) would be altered substantially now that more accurate values of the 
moments of w were available. Using the same method as these authors (p. 483) we derived the values 
shown for comparison in Table 4. For practical purposes, the discrepancies are not serious. We have not 
attempted to use our results to improve the empirical formulae suggested by Tukey (1955). To get the 
best out of his method of attack further accurate computations would seem to be needed, perhaps as 
he suggests for n = 1000. To carry these out, the desk computing methods we have used, as described 
in the next section, would seem to involve a prohibitive amount of labour. 


Table 4. Percentage points of w/s 














| 
Lower percentage points | Upper percentage points 

} 
l | | 
| O58 | 1:0 2-5 5-0 50 | 25 | 10 | O58 | 
—— “ i es | 

David et al. | 4-50 4:56 | 467 | 4-78 | 6-38 | 6-59 6°85 7:03 

From improved moments 4-53 4-59 | 4:68 | 4-78 | 6-39 | 660 | 6-84 7-01 





3. METHOD OF COMPUTATION 


The probability integral of range in a normal sample of size n, P(W) = Pr{w< W}, may be expressed 
in the form 


sw n ie) { u n-1 
PCW) = {{ a(x) de} +2n// 2(u) | a(x) ae} du, (2) 
—iW Ww \ u—-W 
where z(2) = (27)-e-***, When n = 200 the main work of computation consists in evaluating the 


second integral in (2) or m hi _— 
400 f 2(u) i) a(x) de} du (3) 
sw —W 
by quadrature. As seven decimal accuracy in P(W) was aimed at, it was necessary to calculate 
u 199 
2(u) (f =(a) ae} for a series of values of u from 4W up to the point where the expression became 
u-W 
less than 1 unit in the 10th decimal place. The argument interval for wu used in the quadrature had some- 
U 
times to be as small as 0-025. Values of z(w) and 2(a) da to 15 decimal places were taken from the 
“-W 
National Bureau of Standards (1953) tables of the normal probability function, the integral being 


* Only one of the eight values differed by as much as 0-005. 








260 Miscellanea 


raised to the 199th power using tables of 15 figure logarithms. The first term in equation (2) was 
obtainable without quadrature, using the same tables. 


REFERENCES 


Davin, H. A., Hartiey, H. O. & Pearson, E. S. (1954). Biometrika, 41, 482-93, 

Nationat Bureau or Stanparps (1953). Tables of Normal Probability Functions. Applied Mathe- 
matics Series, no. 23. Washington: U.S. Department of Commerce. 

Pearson, E. 8. (1932). Biometrika, 24, 404-17. 

Pearson, E. 8. & Hartiey, H. O. (1942). Biometrika, 32, 301-10. 

Pearson, E. 8. & Hartiey, H. O. (1954). Biometrika Tables for Statisticians, 1. Cambridge University 
Press. 

RuBEN, H. (1954). Biometrika, 41, 200-27. 

Treprett, L. H. C. (1925). Biometrika, 17, 364-87. 

Tuxey, J. W. (1955). Biometrika, 42, 480-5. 


Studies in the history of probability and statistics. V. A note on playing cards 


By M. G. KENDALL 
Research Techniques Unit, London School of Econo:nics 


1. In an earlier article in this series (Kendall, 1956) I referred briefly to the introduction of playing 
cards into Europe. Subsequent correspondence arising out of that reference suggests that it may be 
useful to expand a little what was there said about the impact of cards on gambling. 


2. Playing cards as we know them to-day in western Europe can be traced back in a clear line of descent 
to the beginning of the fifteenth century; but from that point backwards their history becomes more 
and more vague and their genealogy more and more fabulous. Where they originated is unknown. 
Claims have been put forward on behalf of origins in China, India, Arabia and Egypt. It is at least equally 
possible that they were independently invented in Europe. From the first the pictorial representations 
on the cards were thoroughly Western and do not suggest, to my eye at least, any trace of Eastern descent 
such, for example, as does the rook in chess.* Gambling with paper tickets is said to have been known in 
China in the twelfth century, and it is possible that the idea of playing cards drifted across to Europe 
along one of the early trade routes; but for the translation of the idea into practice one does not need to 
look further afield than fourteenth-century Italy. 


3. No mention of cards has been traced in the West before A.D. 1350, and the absence of reference in 
authors like Chaucer and Dante, who mentioned everything, shows that they cannot have been known 
much before that date. They seem to have spread in the Mediterranean countries and Germany fairly 
rapidly. Numerals are said to have been added to the picture cards at Venice in A.D. 1377. Cards are 
mentioned at Niirnberg in A.D. 1380. In a.p. 1397 there appeared an ordinance in Paris prohibiting play 
at various games, including cards, to that part of the population who were engaged in manufacturing. 
(The story that cards were invented to amuse the mad Charles VI of France is false, although packs were 
made for him in A.D. 1393.) Specimens of such early dates have not survived. By a.D. 1423, however, 
the card pack appears to have evolved into its modern form. San Bernardino, whose sermon served me 
in good stead in the previous article (1956), refers to charticellae in quibus variae figurae pingantur, and 
goes on to mention the four suits, the Kings, Queens, Valets and Chevaliers and, as I interpret him, the 
trump cards of the tarot pack.t+ 


* Modern tarot packs are worthless evidence in this connexion. Under the influence of occultists, 
notably Court de Gebelin, who suggested in the eighteenth century that the trump cards incorporated 
the lost book of Thoth, and Eliphas Lévi, whose work on magic popularized the idea in the nineteenth 
century, tarot cards have acquired symbols such as sphinxes which are absent from earlier western cards. 

+ The early suits were clubs, coins (diamonds), cups (hearts) and swords (spades), which San Ber- 
nardino identifies with brutality, avarice, drunkenness and hatred. The court cards denote those who 
are outstanding in these vices. ‘Presbyterorum et presbyterarum in numerabilem multitudinem esse 
volo: unde admittantur pusilli et magni, et feminae et masculi, periti et ignari, sapientes et stulti.’ 
A hundred and fifty years later one John Northbrooke was inspired by this passage to coin the famous 
description of cards as ‘the devil’s picture books’. 

Miss Gertrude Moakley (1956) has recently suggested that the trump cards were representations of 
the Trionfi described by Petrarch in one of his most popular poems. 





oO SS 


Was 


she- 


sity 





Miscellanea 261 


4, The game of tarock (French tarot, Italian tarocchi) is still played in southern Europe in various 
versions and is probably the oldest card game. In its modern form the tarot pack consists of 56 ordinary 
cards divided into four suits, 21 trump cards and a wild card or joker, 78 cards in all. Fifteenth-century 
packs of this type exist. There are some early packs of even larger size, notably the Minchiate pack of 
97 cards, 56 ordinary cards, 35 trump cards and 6 wild cards. Historians of playing cards also mention 
a very rare set of engravings known as the Mantegna tarot; but they are probably not by Mantegna, 
and I doubt very much whether they are tarot cards. They are engraved on sizeable but thin sheets of 
paper, and can hardly have been used in any sort of game involving shuffling and dealing; they are 
divided into five sets of ten, the first, for example, enumerating various social grades from the beggar to 
the Pope, the second giving the nine Muses and Apollo, the third giving ten branches of learning, and so 
on. My own opinion is that these cards were a teaching device and that the unknown inventor of the 
tarot pack copied some of them. Where he got the others is a mystery unless Miss Moakley is right in 
identifying them with Petrarch’s Trion/i. 


5. At some unknown point of time the tarot pack was simplified, or so I believe. The trump cards 
were dropped, and of the 56 ordinary cards one of the court cards was also dropped. ,(In most countries 
it was the Chevalier who was dropped, but in Spain it was the Queen, for reasons which it is interesting 
but unprofitable to speculate upon.) Thus there evolved the basic pack of 52 cards which is in general 
use to-day. The tarot pack survived independently, but in northern Europe is now mainly used for 
fortune-telling. 


6. The student of the history of probability is interested in these matters only in two respects: the 
degree to which cards extended and encouraged gambling, and the reasons for the choice of the number 
of cards in the early tarot packs. From what San Bernardino says it seems that gambling began at a very 
early stage; presumably, as soon as a game came into existence, the adversaries began to wager on the 
outcome. However, extensive gambling with cards was of very slow development. Cardano mentions 
the game of primero, but early writers on chance confine themselves mainly to dice. The reasons, I think, 
were twofold: first, the permutational arithmetic required to deal with probabilities at cards was too 
complicated; secondly, cards were very expensive and dice much more common. Cards did not oust 
dice until the eighteenth century. A third reason, possibly, is that cards and backgammon involved more 
skill and had a higher social status. James I (1603) puts it rather well: 

‘As for sitting, or human pastimes—since they may at times supply the room which, being empty, 
would be patent to pernicious idleness—I will not therefore agree with the curiosity of some learned men 
of our age in forbidding cards, dice and such like games of hazard; when it is foul and stormy weather, 
then Isay, may ye lawfully play at the cards or tables; for, as to dicing, I think it becometh best deboshed 
soldiers to play at on the heads of their drums, being only ruled by hazard, and subject to knavish cogging; 
and as for the chess, I think it over-fond because it,is overwise and philosophic a folly.’ 

James, apparently, was not very good at chess, but his balanced broadmindedness in a Puritan age 
compels respect. 


7. It is interesting to inquire why the early tarot packs consisted of 56+ 21+ 1 or 564+ 35+6 cards, 
but before dabbling in numerology let us note how treacherous a subject it is. One can make out a very 
good case for a connexion between the modern pack of playing cards and the calendar. The four suits 
correspond to the four seasons, the thirteen cards in a suit correspond to the lunar months, the 52 cards 
to the weeks of the year. If we score 11 for a Knave, 12 for a Queen, and 13 for a King, and add all the 
points for the 52 cards we get 364, which, adding one for the joker, gives us the number of days in the 
year.* Many a historical point has been argued on less coincidental evidence than this, but there is, in 
fact, no connexion between the playing pack and the calendar. The early suit cards were 56 in number. 


8. Nevertheless, there are some striking resemblances between the number of the cards in the early 
packs and the number of ways in which dice can fall. The main number, 56, is the number of ways in 
which three dice can be thrown, permutations excluded, and I pointed out in my previous article (1956) 
that these ways were well known by the fourteenth century. The number 21, likewise, is the number of 
ways in which two dice can be thrown, permutations excluded. The number 35 of the larger Minchiate 
tarot could arise either as the number of ways of throwing three dice when sixes are ignored, or the 
number of ways of throwing four (four-sided) astragali. I am not prepared to lean very heavily on these 
coincidences; but they suggest that perhaps the constructors of the first packs, having to choose somehow, 
were influenced by their knowledge of dice-throwing. 


* Nor is this the worst. The number of letters in the sequence Ace, Two, Three, ..., Ten, Jack, Queen» 
King, is 52; so also in the sequence As, Deux, Trois, ..., Dix, Valet, Reine, Roi and in As, Zwei, Drei, ... 
Zehn, Bube, Dame, Konig, the ch being taken as one letter. I owe this information to the firm of Thos. 
de la Rue and Co. Ltd. 








262 Miscellanea 


9. In conclusion, I should like to correct one guess made in my earlier article. I suggested that the 
game of hazard was brought back to Europe by the third crusaders. It may, indeed, have been brought 
back in some such way, but if so, must have been imported by earlier crusaders. The word hasart occurs 
in line 10557 of Wace’s Le Roman De Brut, dated a.p. 1155, and also in Chrétien de Troyes’ Erec et 
Enide, line 356, dated a.p. 1160-70. For these references I am indebted to Prof. Brian Woledge, 
who remarks, incidentally, that the appearance of the initial ‘h’ in ‘hazard’ is an etymological 
mystery which has never been solved. 


REFERENCES 


James I (1603). Basilikon Doron, or a King’s Christian Duty towards God. 

KENDALL, M. G. (1956). Studies in the history of probability and statistics. II. The beginnings of 
a calculus of probabilities. Biometrika, 43, 1. 

MOAKLEY, GERTRUDE (1956). The tarot trumps and Petrarch’s Trionfi. Bull. N.Y. Publ. Lib. 60, 55. 


A singularity in the estimation of binomial variance 


By ALAN STUART 


Research Techniques Unit, London School of Economics 


SUMMARY. For the symmetrical binomial distribution, the limit distribution of the sample 
variance is non-normal and has variance of order 1/n?. 


1. For the binomial distribution, the sample proportion of ‘successes’, p, is a sufficient estimator of 
the probability parameter 7, and 
. nl y = p(1—p)nj(n—1) (1) 


is an unbiased estimator of m(1—7), which is n times the sampling variance of p, where n is sample size. 


p(1—>p) is the sample variance. Since y is a function of the sufficient statistic, it is the minimum variance 
unbiased estimator of its expectation. Its sampling variance is 


n \? 
V(y) = (4) {V(p) + V(p*) — 20(p, p*)}, 
which is expressible in terms of the moments about the origin of p as 
7 n , , , , , ’ 
Vy) = | 5) tea 2a + Ha — (Ha — a) 


On substitution for these moments we find 


Vy) = a(1—7) {a — 2m)? + 2n(1—7) " (2) 

n v—1 
If #+ 4, (2) gives Viy|a+4)~a(1—7) (1—27)2/n, (3) 
while if 7 = 4 (2) becomes Viy|m = 4) = 1/{8n(n—- 1)}. (4) 


It is easily confirmed that the right-hand sides of (3) and (4) are the information bounds to the sampling 
variance of an estimator of 7(1—7), to order n-1, n-* respectively in the two cases. 

The fact that at 7 = } the variance of y is of lower order than n-' raises the question whether its 
limiting distribution is normal in that case. In the following sections it is shown that this is not so. 


2. The characteristic function of p(1—p) may be written 
P p1—w(t) = Ef{exp [0( p — p*))} 
= exp (02*) Eiexp [Op(1 — 27) exp[—O(p—m)*}, (5) 
where @ = it. Since, for random variables u, »v, 
E(uv) = E(u) E(v)+C(u, v), 
we may replace the expectation of the two exponential terms in (5) by the product of their expectations 


plus their covariance C = Cfexp[Op(1—2n)], exp[—-Ap—7m)%}. (6) 














sc 


s 


n 





Miscellanea 263 
If 7 = 4, the first variate on the right of (6) is a constant, so 
C=0 (7=}). (7) 
If 7+ 4, we may use differentials in (6) to obtain 
C = {01 — 2m) exp [Op(1 — 277) }} {— 0 exp[—O(p—7)*}} C{p, (p—m)}+R,, (8) 


the remainder being of lower order in n than the leading term given on the right of (8). The first two factors 
on the right of (8) are to be evaluated at the true parameter point p = 7, and are there of order zero in n. 


The third factor is Cp, (p—2¥} = Bip=a) = a(—n)(U— 2a) in? 
exactly, so we may write (8) as 
C = kn-*+0(n-*). (9) 
Using (7) and (9), (5) becomes, for all 7, 
$ va—n(t) = exp (O7*) J p-am(t) P¢>_me (t) + 0(n-#). (10) 
3. By the classical central limit theorem, 
(p—m)/{a(1 —7)/n}4 


has e. limiting normal distribution with characteristic function exp (46*). It follows that its square has 


a limiting y? distribution with 1 degree of freedom and characteristic function (1—2@)-?. From these 
results, it follows that 


$ pa-2m(t) = exp {On(1 — 277) + 36°n(1 — 77) (1 — 277)?/n} (1+ 0(1)), 


11 
$ty-axlt) = {1+ 20m(1 —m)/n}-4 (1 +0(1)). ii 
Substitution of (11) into (10) gives 
$ pa—p(t) = exp {On(1 — 7) + $62n(1 — 7) (1 — 277)?/n} {1 + 200(1 —m)/n}-* (1 + 0(1)). (12) 
From the definition of y at (1), we have, correcting for the mean 
Py—na—m(t) = exp {Oa(1 —7)/(n— 1) + $6?n(1 — 77) (1 — 277)? n/(n — 1)?} 
x {1 + 26m(1—7)/(n—1)}-* (1+ 0(1)). (13) 
If 7+ 4, we standardize y, using (3), to obtain for 
z= {y—m(1—m)}/{V(y)}4 
the characteristic function 
$,(t) = exp {O[na(1 —7)}8/[(n — 1) (1— 277)] + 362n2/(n — 1)} 
x {1+ 20[na(1 —7)]8/[(m — 1) (1 — 277) ]}-3 (1 + 0(1)), (14) 
so that lim ¢,(t) = exp(462) (7+4), (15) 
n—>o 
and the distribution of y, properly standardized, is normal in the limit. 
If, in (13), 7 = 4, the term in @? disappears and (13) becomes 
pya(t) = exp {9/[4(n — 1)]} {1 + 8/[2(n — 1)]}}-* (1 +0(1)). (16) 
We now standardize by (4) and obtain from (16) 
G,(t) = exp ,H(n/[2(n— 1)])} {1 + O[2n/(m— 1)}}-4 (1 +0(1)). (17) 
Thus lim ¢,(#) = exp (0/24) (1+6.23)-! (17=}), (18) 
n> 


and the distribution is non-normal. 
From the form of (18), in fact, it is clear that 


1—z.2! 
has a y? distribution with 1 degree of freedom in large samples. 


4. The reason for this singularity in the distribution of y is clear. When 7 = }, 7(1—7) attains its 
maximum value }. y is an unbiased estimator of this value, but itself has a maximum value jn/(n—1), 
which approaches } when n becomes large. Since y is thus estimating a value which tends to one extreme 
of its range, it is to be expected that the distribution of y should be very skewed, and the effect becomes 
more pronounced as sample size increases. 








G 


264 Miscellanea 


The position is comparable with that of the distribution of the squared sample multiple correlation 
coefficient, R?, in samples from a multinormal population. If the population multiple correlation para- 
meter is non-zero, R? is asymptotically normal with variance of order n-!. But if the parameter is zero, 
R?, like y in our discussion above, is essentially estimating a parameter at one extreme of its range, has 
variance of order n-*, and is non-normally distributed. 


5. Finally, an implication of the result should be briefly menticned. In testing the equality of the 
values of 7 in two populations, we use a large sample standard error test based on the fact that 


1 1)\)3 
(r.-po|{na —T) (~+-)| 


is asymptotically a standardized normal variate. If we are testing the composite hypothesis, 7 being 
unspecified, we estimate 7(1—7) by y as above, calculated from the pooled samples, with n = n, +n. 
Our result implies that, if 7 = 4, y estimates its expectation with greater precision than if 7 + 4, and this 
presumably improves the accuracy of the normal approximation using the estimated standard error. 


Student’s distribution and Riemann’s elliptic geometry 


By AUREL WINTNER 
Johns Hopkins University 


The density of probability for Student’s ratio is* 
o/(1+a*)t*, (1) 


where k (> 1) is the number of the variables and the value of c = c; is determined by the condition that 
the integral of the function (1) over the z-line is 1. The algebraic structure of the function (1) suggests 
a simple geometrical approach} to Student’s distribution. In fact, such an approach leads to the 
following interpretation: 

If k is denoted by n+ 2, and if R, (where n> 0, hence k> 2) is the space of Riemann’s n-dimensional 
elliptic geometry, then refer R,, to his normal co-ordinates x, ...,%,, where —00<x,< 00; consider on 
R,, that distribution which represents equidistribution (in terms of the co-ordinates x;); denote by L 
a line through the origin of R,,; finally, denote by x, where —00<a<0o, the abscissa of that point of L 
which is the orthogonal projection of an arbitrary point (x,,...,7,) of R,. Then the function (1) on L is 
the density of probability of the orthogonal projection of the equidistribution on R,,. This can be seen as follows: 

Stereographic projection of the Euclidean n-space shows that the squared line-element on FR, is 


n n 
ds?=g > dx?, where gt = 1+ > 2}, (2) 
i=1 i=1 
if the ‘diameter’ of R,, is chosen to be the unit of length. But if 


n n 
dst = J Dd) Gis(, «+29 ¥_) Cx, da; 
i=1j=1 
is any (positive-definite) Riemannian metric, then the Riemannian ~-olume element is (det g;,)? times 
the Euclidean volume element. Hence it is clear from (2), where g,; = )6;;, that the Riemannian volume 
of that portion (infinite slab of thickness dx) of the (x,, ...,x,)-space which is contained between the two 
hyperplanes x, = x, x, = «+dzx is dx times the value of the (n — 1)-fold integral 


@o @o 
| -f (1+a2%+972)-"da,...d@y_1, (3) 
-—o -o 


. where r? = 2j+...+22_,. Accordingly, the function (3) of 2, where —0o0<2< 0, is the density of the 
equidistribution projected on L, if Z is chosen to be the z,,-axis. 
In order to evaluate the integral, use (ordinary) polar co-ordinates, r and n — 2 angles, in the Euclidean 
(n— 1)-space (2, ...,¥%,_,). These reduce the (n — 1)-fold integral (3) to a constant multiple of an integral 
over r alone. In fact, the result is 


a 
of (1+a%+4r2)-"r"-2dr, 
0 


* See, for example, Kendall (1952, chapter 10) where further references will be found. 
+ In this connexion see Wintner (1940, pp. 287-97 and 1947, pp. 168-73). 











ean 
oral 








Miscellanea 265 


where C = C,,_», being the contribution of the n — 2 angles, is the Euclidean measure of the unit sphere 
(r = 1). But if x, hence X = (1+.*), is fixed and if the integration variable r is replaced by t, where 
r= Xt, then the last integral appears in the form 


i} X81 41) (EX M-A) ehen-s dt, 
0 


which, since X is independent of ¢, is a constant multiple of X-2"X"—1 = 1/X*"+1 or, since X = (1+2°)!, 
of Student’s density (1), where k = n+ 2. 


REFERENCES 


KENDALL, M. G. (1952). The Advanced Theory of Statistics, 1, 5th ed. London: Charles Griffin and Co. 
Ltd. 

WINTNER, A. (1940). Astronomical Papers Dedicated to Elis Strémgren. Copenhagen. 

WinTNER, A. (1947). The Fourier Transforms of Probability Distributions. Baltimore. 


Some interrelations among compound and generalized distributions 


By JOHN GURLAND 
Iowa State College 


1. INTRODUCTION 


It is well known (cf. Greenwood & Yule, 1920; Feller, 1943) that the Pascal (negative binomial distribu- 
tion) may be regarded as a compound distribution if the compounding is effected through a gamma 
distribution. More recently it has been noted (cf. Jones & Mollison, 1948; Quenouille, 1949) that the 
Pascal distribution may also be regarded as a generalized Poisson distribution if the generalizing is 
effected through a logarithmic distribution. 

The development in the present paper consists of the following: first, a convenient symbolism form- 
alizing the notions of compound and generalized distributions is introduced; then a simple relation 
that exists for a certain class of compound and generalized distributions is formulated in a theorem. 
Some examples relating to the theorem are noted. Next, the relation involving the Poisson, gamma and 
logarithmic distributions indicated in the papers referred to above, is extended one stage further, and 
the general extension for a finite number of stages is indicated. 


2. CoMPOUND AND GENERALIZED DISTRIBUTIONS 


Compound and generalized Poisson variables have been studied by Feller (1943) and Satterthwaite 
(1942). More recently, compound and generalized variables other than Poisson have also been con- 
sidered (Skellam, 1952; Feller, 1950; Gurland, 1957). The present definitions and notation are given to 
facilitate manipulations with such distributions. 


Definition 1. Compound distribution 


Let the random variable X, have the distribution function F,(x, | 0) for a given value of the para- 
meter 0. Suppose now that 6 is regarded as a random variable X,, say, with distribution function F,(2,). 
Denote by X, A X, the random variable with distribution function 


‘. F,(2, | Cay) GF'o(x2), (1) 


where c is a constant which is arbitrary or restricted in some prescribed sense. Then the random variable 
X,A X, (uniquely defined here except for the constant c) is called a compound X, variable with respect 
to the ‘compounder’ X. 

The advantage of introducing the constant c will become apparent in the formulation of the theorem. 
It should also be remarked that in the above definition the constant could be incorporated in the F, 
term instead of in the F', term, since (1) can be written equivalently as 


ioe] Le 
F(a, | 22) ar,(”) 
—o Cc) 








266 Miscellanea 


Ifseveral parameters 6,, 0, ...,9,,say, appear in the distribution function F’,, then the notation X,A X, 
is inadequate to specify which of the parameters is involved in the compounding operation. In such 
a case, to avoid ambiguity, we would write 


X4(O1, Og, «+5 Op) A Xe, 
1 
where @,, say, is the parameter specifically involved in the compounding operation. 
As an example of definition 1, let X, be a Poisson variable with p.g.f. (probability generating function) 
e%--), and X, be a gamma variable with c.f. (characteristic function) (1 —7it/a)-4. Then the p.g.f. of X,a X, 
is given by 


aA fo Cc -A 
Ta) . ecte—)) e-at yA-1ldy — [1-e-n] (c>0, a>0,A>0). (2) 
If we consider a Pascal variable with p.g.f. 
[1-p(z-1)}-* (p>0, k>0), Ps 


it is evident that for each set of values c, a, A in (2) a set of values p, k can be assigned in (3) such that 
expressions (2) and (3) have the same value whatever be z. Also, to any set of values p, k in (3) there 
correspond values c, a, A (not unique here) such that (2) and (3) have the same value whatever be z. 
We shall call two such distributions equivalent and now give the following formal definition. 


Definition 2. Equivalent distributions 
Suppose the random variables X,, X, have distribution functions F,(x|«), F,(a |) respectively. 
(a and/or # may be multidimensional.) If for each « there exists some f and for each £ there exists some 


«such that F(x |) = F(x | 2) whatever be x, the random variables X, and X, are said to be equivalent, 
and we write X, ~ X,. 


Occasionally it is convenient to represent a random variable by the name of its corresponding dis- 
tribution. Thus, in the case of Poisson, gamma and Pascal distributions the meaning of the relation 
Poisson A gamma ~ Pascal (4) 
should be clear. 
The following definition of a generalized distribution is given formally in terms of probability gener- 
ating functions because of greater generality and ease of manipulation. 


Definition 3. Generalized distribution 


Let the random variables X,, X, have p.g.f.’s g,(z), go(z) respectively. Denote by X,v X, the random 
variable with p.g.f. g,(g_(z)). Then X, Vv X, is called a generalized X, variable with respect to the ‘gener- 
alizer’ X,. 

It may be remarked here that X,, X, need not be discrete variables as assumed, for instance, by Feller 
(1943). For X, the p.g.f. is merely taken as Hz*:, and likewise, of course, for X94. 

The following theorem gives a simple relation for a certain class of compound and generalized dis- 
tributions. 


THeoreM. Let X, be a random variable with p.g.f. [h(z)]?, where 0 is a given parameter. Suppose now 
6 is regarded as a random a wae X;,, say, with distribution function F, and p.g.f. ga. Then, whatever be X, 


X,AX_~ X,V Xj. 
Proof. The p.g.f. of X,a X, is given by 
ao 
| [A(z)}** dP,(x) 
-o 
and that of X,v X, is given by 


92(91(2)) = | [h(z)]?* dF,(2). 


It is obvious, of course, that these p.g.f.’s exist, at least, for z = e, and are equal when c = 0; hence the 
theorem is proved. 


Some examples illustrating the above theorem are the following: 


Poisson A Poisson ~ Poisson v Poisson, (5) 
Pascal (k, p) A Poisson ~ Poisson v Pascal, (6) 
k 


Pascal (k, p) A gamma ~ gamma Vv Pascal. (7) 
k 











Ani 


Tos 
aritl 
p.g.t 


loge 


Sin 


it is 
equ 


AX, 
such 


tion) 


yener- 


ndom 
yener- 


Feller 


d dis- 


36 NOW 


‘be Xy 


ice the 


(5) 
(6) 


(7) 











Miscellanea . 267 


Itis, of course, apparent from the theorem that two different mathematical interpretations may underlie 
the same distribution. Thus, as already pointed out by Feller (1943) the Neyman Type A contagious 
distribution may be interpreted as a compound Poisson or a generalized Poisson distribution. This is 
epitomized in (5) above. 

Relation (7) is also interesting in that the gamma variable appearing in the right-hand member is 
a continuous random variable. 


3. EXTENSION OF A CERTAIN RELATION 
As mentioned in § 1 it is known that the following eauivalence holds: 

Poisson A gamma ~ Poisson v logarithmic. (8) 
An immediate extension of (8) is the following: 


Pascal (k, p) A gamma ~ Pascal v logarithmic. (9) 
k 


To see this, let X; (¢ = 1,2) be Pascal variables with p.g.f.’s [1 —p,(z;— 1)]-* (7 = 1, 2); let X, be a log- 
arithmic variable with p.g.f. —alog(1—7z) and X, be a gamma variable with c.f. (1—it/#)-A. The 
p.g.f. g(z), say, of X,(p,,k,) AX, is given by 

k 


g(2) = “file [1 —p,(z— 1)]-** e-4# 2A“ dxr 


Pr ‘tn 
© jog (1+p,) +5 lo (1- :) | is 
2 . B . 1+p, 


On the other hand, the p.g.f. h(z), say, of X,v X, is given by 
h(z) = [1+ p.+ ap, log (1 —7z)]-™, 


and the required equivalence X,A X,~ X,V X, is immediately apparent. 

It is also possible to extend (9) in the same manner as (8) by letting the gamma and logarithmic 
variables play the same role as in the previous extension. This extension could be carried through any 
finite number n, say, of stages. One might refer to the new random variables obtained in this manner as 
nth stage compound Poisson and nth stage generalized Poisson respectively. Insuch a context the Pascal 
distribution could be referred to as a first stage compound Poisson or generalized Poisson distribution; 
whereas the variable Pascal (k,»)A gamma or Pascal v logarithmic could be referred to as second-stage 
variables of this type. k 

To sketch the method of proof for the general case let Y,; be an ith stage compound Poisson and W, be 
an ith stage generalized Poisson variable. Let the corresponding compounder and generalizer be 
gamma (f;,,A,;) and logarithmic (7;) respectively. 

Repeated application of the compounding operation in (2) shows that the p.g.f. 


. C1, a 
of Y, is ar 


—Az 
of Y, is 1+ Flog {1-%-(2—1)} | ; 





of Y, is 1+ 7,108 1+ “tog (1-19) |, 
etc. 


To write the p.g.f. of W; let the original Poisson variable have p.g.f. e%*-» and write the p.g.f. of 
logarithmic (7;) as — a, log(1—7;z), where «;log(1—7,;) = —1 (0<7;<1). 
Then the p.g.f. of Wis ¢-%(1—7,z)-%0, 

of W,is e[1+7,a,log(1—7,2)]-“9, 

of W,is e[1+7,a,log{1+7,a3 log (1—7,z)}]-™9, 

etc. 
Since e-? = (1—7,)"9 (0<7,<1) 

= (1+7)-9, say (m,>0), 


it is clear that for each finite k = 1, 2,..., the p.g.f.’s of Y,, and W, are of the same form and hence yield 
equivalent random variables. 








268 Miscellanea 


4. SUMMARY 


A simple relation between a certain class of compound and generalized distributions is pointed out 
and a iew examples are given. In particular, some distributions may be regarded both as compound 
and as generalized. Finally, a relation between the Poisson, gamma and logarithmic distributions is 
extended, which involves the Pascal distribution, and a generalization of this extension is indicated, 


REFERENCES 


FELLER, W. (1943). On a general class of contagious distributions. Ann. Math. Statist. 14, 389-400, 

FELLER, W. (1950). An Introduction to Probability Theory and its Applications. New York: John 
Wiley and Sons. 

GREENWOOD, M. & YULE, G. Upny (1920). An inquiry into the nature of frequency distribution repre- 
sentative of multiple happenings with particular reference to the occurrence of multiple attacks 
of disease or repeated accidents. J. R. Statist. Soc. 83, 255-79. 

GuRLAND, J. (1957). On Beall and Rescia’s generalization of Neyman’s contagious distributions, 
Submitted for publication. 

Jonss, P. C. T. & Motuison, J. E. (1948). A technique for the quantitative estimation of soil micro- 
organisms. J. (7en. Micr. 2, 54-69. 

QUENOUILLE, M. H. (1949). A relation between the logarithmic, Poisson, and negative binomial series, 
Biometrics, 5, 162—4. ' 

SATTERTHWAITE, F. E. (1942). Generalized Poisson distribution. Ann. Math. Statist. 13, 410-17. 

SKELLAM, J. G. (1952). Studies in statistical ecology. I. Spatial pattern. Biometrika, 39, 346-62. 


A note on tests of significance for linear functional relationships. 


By M. 8S. BARTLETT 


University of Manchester 


Williams (1955) has recently developed further the exact significance tests for a single hypothetical 
non-null canonical variate, first proposed by him (1952) in special cases and considered further by the 
author (1951). He has also, however, advocated analogous tests for hypothetical linear relations or null 
variates in situations where these tests are not correct, and the purpose of the present note is to draw 
attention to this. 

It seems convenient to summarize these tests in terms of the factorizations discussed in my 1951 
paper. For simplicity the case of one null variate will be considered, in contrast with one non-null 
variate (the more general case being r null variates, in contrast with r non-null variates). It was pointed 
out in my earlier paper that if the regression relation of a vector variate x with p components 2; 
(i = 1,...,p) on another variate y with g components ( <q) gives the p canonical roots R? (7 = 1,...,p), 


oom Ainsp.a) = TB), (1) 


and if a hypothetical linear combination £ of the x, gives the ccrresponding measure of association (more 
strictly, disassociation) 1— ¢, then the exact criterion 


A’(n—1,p—1,9) = A/((1—¢) 
1— Rj) 
=< “CI (1-9), (2) 
-¢ j>1 
where the two factors in (2) may be tested approximately. In place of the ‘approximate’ factorization 
in (2), ‘exact’ factorizations are possible 


A’ = A’(n—1,p—1,1) A"(n—-2,p—1,q-1), 
A’ = A'*\(n—1,p—1,q—1) AX(n—g,p—1, 1), 


may be factorized further into 


as 





(3) 


in terms of the projection of £ in the sample space of the q variates y, (k = 1,...,q). 
Consider now the case of one hypothetical null variate 7 (in relation to the set y), and let the corre- 
sponding measure of association of this with the other set y be 1 — A. Williams suggests that will define 








| 


; 


ed out 
pound 
bions is 
icated. 


39-400, 
: John 


| repre- 
attacks 


utions. 
micro- 
| series, 


17. 
-62. 


hetical 
by the 
or null 
o draw 


y 1951 
on-null 


ointed 
ents uy 


\onoge 
(1) 


1 (more 


(2) 


‘ization 


(3) 


 corre- 
1 define 





Miscellanea 269 

p— 1 variates orthogonal to 7 in the sample space, Z, say, and that if the relation of z with y is measured 
by A’(n, p — 1,q) we may consider 

A"(n—p+1,1,q) = A(n,p,q)/A’(n, p—1,q)- (4) 


If, moreover, the relation with x (in the reduced space orthogonal to Z) of the projection €, say, of z 
ony is given by A”(n—p+1,1,p—1), then the further factorization of A” in (4) is possible, 


A” = A"(n—p+1,1,p—1) A*(n—2p4+2,1,q—p+1), (5) 
with an alternative factorization (the remaining variables in y being taken first) 
A” = A*(n—pt+l1,l,q—pt1) A%(n—g, 1, p—1). (6) 


One possible minor point of obscurity may be cleared. It is not immediately obvious that the total 
number of degrees of freedom in the criterion A’(n, p — 1, q) in (4) is still n, as the variates in z are already 
orthogonal to 7. However, as by hypothesis 7 is randomly orientated with respect to y, the stochastic 
part of z will be randomly orientated with respect to y, and there is no reduction in n. Nevertheless, this 
perhaps indicates the sort of difficulty that will arise, for in general, of course, z isnot random with respect 
to y, and under such non-null conditions for z the relations of the variates are presumably more 
complicated. 

That this is so may be demonstrated in a simple case, but first it is instructive to consider the more 
direct: factorization corresponding to the factorization in (2) for testing §. The simple factor 1 —A is of 
the type A(n, 1, q) and is expected to be insignificant. Moreover, we may write 


1-A 

1-a= | oe |e, (7) 
where the significance of R? would indicate the existence of no null variate, and the significance of the 
other factor that 7 is not the right null-variate. Unfortunately, while the overall test based on 1—A, 
when available, is exact (on the usual normality assumptions), the two separate factors in (7) are rather 
less convertible into approximate x? components than in (2), for the crudest y? testing of R?, would rely 
on all the remaining p — 1 variates z having close relation with y, and this need not be the case. However, 
as actual insignificance of some of the other canonical correlations would be evidence of more than one 
null variate (cf. the discussion by Tintner (1946) and Bartlett (1948) of two null variates), a cautious 
use of such y? approximations may still be convenient if a further factorization of 1—A as in (7) is being 
considered. 

The difficulty with any exact analogues of the factorizations in (3) is that, while we may eliminate 
p—1 hypothetical canonical variates if these are given a priori (and so test the goodness of fit of these 
variates), the choice of p —1 variates orthogonal to 7 in the sample space, when only 7 is given a priori, 
is not equivalent to this. 

It might be noticed that if in (7) the condition p <q were dropped and the case g = 1, say, considered, 
R, is zero, and no factorization of 1—A arises. In contrast, Williams gives an analysis of variance table 
(top of p. 375) in which the sum of squares A for 7 is tested not against a residual sum of squares 1 — A, 
but against the residual 1— R*, a quantity quite irrelevant to the analysis of 7 and one, moreover, 
which could be made as near zero as we wished merely by increasing the irrelevant association with y 
of the p—1 other canonicai variates. A very special case which may bring this point out even more 
clearly is that of p = 2, when there are two dependent variates x, and x2, of which 2, say, is the hypo- 
thetical null variate. While it might be of interest to know whether 2, is also uncorrelated with y, there 
is no justification for inflating the straightforward significance of 7,, as measured by A, if x, is correlated 
with y. 


I learn from correspondence with Dr Williams that he fully agrees with my criticisms, and he notes 
that §§ VII and VIII of his paper will consequently require amendment. 


REFERENCES 


Bartiett, M. S. (1948). A note on the statistical estimation of supply and demand relations from 
time-series. Hconometrica, 16, 323. 

BartLett, M. S. (1951). The goodness of fit of a single hypothetical discriminant function in the case 
of several groups. Ann. Hugen., Lond., 16, 198. 

Trvtner, G. (1946). Multiple regression for systems of equations. Hconometrica, 14, 5. 

WitutraMs, E. J. (1952). Some exact tests in multivariate analysis. Biometrika, 39, 17. 

WitutaMs, E. J. (1955). Significance tests for discriminant functions and linear functional relation- 
ships. Biometrika, 42, 360. 








270 Miscellanea 


The moments of the Leipnik distribution 


By M. G. KENDALL 
Research Techniques Unit, London School of Economics 
1. The distribution 
i (1—r2)in-4 dr 
~ Bin + 4, 3} (1+p?— pr) 








(—l<r<}l) (1) 


was obtained by Leipnik (1947), following a method due to Madow (1945), as an approximation to the 
distribution of the first serial correlation, circularly defined, with known mean, in samples of n from a 
Markoff scheme. Recent work by Daniels (1956) and Jenkins (1956) show that approximations to the 
distributions of the coefficient with fitted mean, and non-circularly defined, have moments which are 
simply expressible in terms of the moments of (1). 

Leipnik himself found the mean and second moment of the distribution by a complicated procedure. 
Jenkins (1956) has recently obtained two more by a method whicu is also rather complicated. In this 
note I give a general expression which enables moments of all orders: t.. += written down. 


2. We have 





1 Om, _ : r*(p—r) (1—r?)n-4 dr 


nop J-1Bint+43,4}(1+p?— 2pr)ie 


_ frre  ap_ [ Lte?— (1 +p? — 2pr) 
~ J 1+p?—2pr 2p(1 +p? —2pr) 
2_(] 29 2 k 
Pa. 1+)? —( +p*—2pr) gp ite r ee hae 
2 1+p?—2pr 2p 


red F 





1+ p?—2pr 2p 








lig, lef (e-nr 
= gp he— Haat 2p 1+p?—2pr 


7 
> 








a 12,1) ,,_(1+0%@ 1), P 
meta nop 2p Me = 2pn apt? Prov (2) 


3. This relation can be used to build up the moments from fy (= 1) and pw, (= np/(n+2)). But we 
can also use it to derive explicit expressions. We note in the first place that if P,, is a polynomial in p of 
degree k the solution of 


is a polynomial of order k+ 1. It follows that y;, is a polynomial in p of order k. Moreover, it follows from 
(2) and the lower values of , that even-order moments contain only even powers of p and odd-order 
moments only odd powers of p. 


Let , k 
y= D temp”: 
m=0 


On differentiating (2) m times and putting p = 0 we find 


Anm {(m + 1) ayy, mar + (+m — 1) ay_4, m—y}- (3) 


pu n+2m 


4. We have py = 1, dg) = 1, and hence 


giving ‘= 


cles 


Pu 


we 


Sin 





th 





(2) 


ut we 
in p of 


s from 
-order 


(3) 


(4) 








Miscellanea 271 


By successive application of (3) we find 


A 1 n(n +1) p? 

















Ma = 249" (nt 2)(n+4)’ (5) 
; 3np n(n + 1) p® 

Fs = (1 42)(n+4) | (n+4)(n+6)’ (6) 
? 3 6n(n + 1) p? n(n+1)(n+3) p* 

Ma = (2 +2) (n+ 4) (n+2)(n+4)(n+6) + (n+4)(n+6)(n+8)’ (7) 
‘ 1inp 10n(n + 1) p® n(n +1) (n+ 3) p> 

Ms = (n +2) (n+4)(n+6) (n+4)(n+6)(n+8)  (n+6)(n+8)(n+10)’ (8) 
Z, 15 45n(n + 1) p? 15n(n + 1) (n+ 3) pt 

Me= (n+2)(n+4)(n +6) (n+2)(n+4)(n+6)(n+8) (n+4)(n+6)(n+8)(n +10) 

n(n + 1)(n+3) (n+ 5) p® 


ree ——.. (9) 
(n+ 6) (n+ 8) (n+ 10) (n+ 12) 


Formulae (6) and (7) agree with Jenkins’s results. The general law of formation of the terms will now be 
clear. The coefficients may be set out as follows: 


Coefficient of powers 





Order of - A “ 
moment 0 1 2 3 4 5 6 7 8 9 10 

1 1 Z 

2 1 1 

3 3 : 1 

4 3 . 6 . 1 

5 15 ; 10 1 

6 15 . 45 15 1 

7 ; 105 : 105 : 21 . 1 ; 

8 105 ¥ 420 ; 210 - 28 - 1 : 

9 i 945 . 1260 - 378 . 36 2 1 ‘ 

10 945 > 4725 ‘ 3150 % 630 ; 45 ; 1 


k! 
The numerical coefficient of a;,,, is mi{i(e—m)}!2 —° The results are easily demonstrated by induction 
from (3). 


5. From (2) we find that the moment generating function M’ obeys the relation 


1a 1\0aM’ l+p?@ 1 
-—-+— = aa M’. 
nop 2p} 00 2pn dp 2 
“0 


rT 
“) om. 








Putting M(0)’ = exp ( 


we obtain the m.g.f. of the mean moments. Identifying coefficients in 0* we find 


an n+2—(n—2)p?ou, nk nk (n+2—(n—2) p? 
Pen = ov : 


ap * 2p ~ Qp(n +2) 


— -2p(n +2) ep n+ot n+2 





Mer (10) 


Since fy = 1, #, = 0 we find 








1 n(n — 2) p? 1-p? 
Me n+l (n+22(n44) nm” -_ 
6np 2n(n—2)(38n—2)p?  —6p(1—p?) 
Ms=— Ry 2A(nt4)  (n+2)%(nt+4)(nt+6) ow?” (12) 
3 6n(n? — 8n + 4) p? 3n(n — 2) (n?— 14n2+12n—8)p* 3(1—p?)? 
a= 42) (n4+4) (n+2)9(n4+4)(n4+6)* (n+2)*(n+4)(n+6)(n48) ~ v2 (1) 


These also agree with Jenkins’s results. It does not appear that general formulae are obtainable with 
the simplicity of the moments about zero. 








272 Miscellanea 


6. The limit of (10) with n large may be written 


ra) 
Mites = (1 = PYF Boke + KL —p*) fa (14) 
Thus, with k = 4,5, we have — 60p(1 — p?)? 
pew, (15) 
n 
15(1 —p2)* 
byw (16) 


Again the general law is clear. In standard measure the odd-order moments tend to zero, the even- 
order moments to those of the normal distribution; and thus the distribution tends to normality. 


7 


7. It may be of interest to record another recurrence relation between the moments, namely, 
, , n+ 1 9° , ‘ 2 , 
Peg (%) = fy”) were {2pp,(n + 2) — (1+?) p,_,(n+2)}, (17) 
where y’(n) refers to the moment about the origin based on a sample of n. 


REFERENCES 


Dantes, H. E. (1956). The approximate distribution of serial correlation coefficients. Biometrika, 
43, 169. 

JenkKtns, G. M. (1956). Tests of hypotheses in the linear autoregressive model. II. Biometrika, 43, 186. 

Lerenik, R. B. (1947). Distribution of the serial correlation coefficient in a circularly correlated 
universe. Ann. Math. Statist. 18, 86. 

Mapow, W. G. (1945). Note on the distribution of the serial correlation coefficient. Ann. Math. 
Statist. 16, 308. 


The effect of transformations of variables upon their correlation coefficients 


By M. H. QUENOUILLE 
Research Techniques Unit, London School of Economics 


Suppose 2, to be normally distributed with zero mean and standard deviation o, and let y, be derived 
from x, by a monotonic transformation 


n= ¥ at(2) 
t Nig NG)’ 


where H (x) is the ith Hermite polynomial. Let the correlation between x, and 2;_, be p,. 
The correlation between y, and y,_, may then be derived using the relations 


x, ® 
E| H,\—)H,(—) | = ij 
[(e)a(S)]=9 asa 
=ilp, (i=)). 
These little-known relationships spring from the more-familar equations 


| (2) 2(®) | =0 (i+j) 


=i! (i=j), 





by transforming the variables to independent normal functions, 2,, p,x,+y, and using 


j : 
Hi(y%+e)= > (?) ckH;_,(y,)- 
k=0 \© 


o¢) 
Tt follows that COV (Ys %-s) = LY ati! pi 
i=1 





aj p, + 205 p35 + bas pi+... 


and »Y--s) = 
Pl ¥:-«) = — oa + Sat + bal +... 














(14) 


(15) 


(16) 


even- 


(17) 


trika, 


186. 
lated 


Math. 


rived 














Miscellanea 273 


Consequently, p(y, 4-3) is a weighted average of p,, p?, p?, ... and, except when p, = 1, is necessarily 
less than p,. Further, the difference between p(y,,y,_,) and p, will be greater for negative p, than for 
positive p,. 

It follows that, in any time series, transformation of the observations to normality maximizes the 
serial correlations, and that failure to achieve ncrmality may be marked by the positive serial correla- 
tions being larger in modulus than the negative s. vial correlations. 

This latter fact will be most obvious when the serial correlations contain a harmonic component of 
amplitude b. In this instance, the average of p, over a run of values of s is zero for odd n, and 


b(n — 1) (n—83)... /{n(m—2) ..233 
for even n. Thus the average of p(y,,y;,_,) over a run of values of s is 
a +(1.3)2a2+(1.3.5)%a2+... 


a? + 2a2 + 6a2 + 2402+... 





This expression may be fairly appreciable. For instance, for the transformation y, = exp (x,/0) — exp 0-5, 
a, = o**/i!, and this expression becomes 


o2 2 l (or 4 l o2 6 
G) an) +a (5) + 


1 1 
2 —_g4 6 
o ar yh i + 0 





The numerator of this expression may be derived from J,(a?), or, more rapidly in most instances, by 
direct calculation. For example, for o = 1, this expression is equal to 0-155b, an appreciable shift in the 
mean value. 


Further properties of an angular transformation of the correlation coefficient 


By B. I. HARLEY 
University College, London 


1. Ina previous paper (1956) the properties of the transformation 
y = sin-'r, 


where r is the correlation coefficient in a sample of size n from a normal bivariate population with 
correlation p, were discussed and it was shown that if &,(sin-!7) denotes the expectation of sin! r then 


&,(sin-r) = sin-1 p (1) 


for all values of n. This was proved directly only for even values of n, but by considering the form of the 
distribution of 7/(1—r?)! we can obtain directly an alternative method to that given in the previous 
paper, of proving (1) for odd values of n. The results suggest that the distribution of r/(1—7*)! may be 
similar to that of the non-central ¢ distribution, and this I have investigated in another paper (Harley, 
1957). 


2. (x,,y;) fori = 1,2,...,n, are n pairs of values drawn randomly from a normal bivariate population 
with correlation p. Without loss of generality we can arrange that in the population the mean values of 
xvand y are zero, and the standard deviations are unity. If % and 7 are the means of the n sample values, 
then the coefficient of correlation of the sample is given by 


Py (vj; —*%) (Ys--Y) 


r= 7 ° a 2° (2) 
n n 2 
(= (x,—%)* x (yi -»") 
i= i= 
Transforming to the variables 
»_ &—py 
X= (1—p2)! and Y=y, 
we find that X24 Y? = (x? -—2pay+y*)/(1—p*), 


18 Biom. 44 








274 Miscellanea 
and (X;,, Y;) may be considered as n pairs of values drawn randomly from a normal bivariate population 


with zero means, unit standard deviations and correlation zero. Writing X, Y for the sample means of 
X, Y and S%, S% for their variances, it follows from (2) that 


= (x;—%) (yi:—Y) 


> (x,—-%)? 5 (y¥:—9)?— (= (x;—%) (ys— 9) \ 


i=1 


“ 
(—ryi 








(1—p?)4 >» (X,—X)(¥,- Y+e (Y,-Y) 


2\3 
(ap Dae - x 3 0, r)? E(x, _~X)(¥,- % |} 


i=1 








n s\ — 
¥(%-2)0,-¥)| 








qd —p*)t n ) Ve n ae ee 
(= (x,-Xy) F (x,y S v,- Fy 
i=] i=1 





J) p a) 1 
~ a= = oa (a5 (1=r3) 


where 7, is the correlation coefficient of a sample drawn randomly from a normal bivariate population 
with zero correlation. 





To ¥ 
Let —_=%t, and —=fFfF, 
(1—r3)t S¥ 
h 1 a+mi 
then (1—73)i = (1 + #2) 
0 
and aca = ty + F4(1+ 22)! tan (sin-'p). 
Let sin-p=0 and F'tan@ =z, 
r 
then aol = to +2(1+2@)h. (3) 
3. If E, = &,(sin='r), 
then E,, = @,|tan- 
(1—r?)h 
n(tan— [ty + 2(1 +2)4]), 
ok, Ft sec?@ 


00°" (1 +22) (14+ 8)8 + 2ty2} 
F and t, are independent for the case we are considering, and thus to derive the expectation we can 
integrate first with respect to ¢, and then with respect to F’. 
The distribution of t, is known to be 
1 
B(4n—1,}) (1 +e) 
We wish to show that EZ, = sin-'p = 0. Ifn = 3 


P(ty) = 


1 1 
P(to) = m(1+#2)’ PF) = +P)" 
oF, -_ © Fisec?@ dt, ; 
Hence 00 y I — af ao {_ +2%)(1 + #2) + 2t,2} (1 +2 8) pF) dF. 








If 


Co 


Su 


H 


I 


a 


Miscellanea 275 





ition Let ¢9 = sinh ¢; then the integral with respect to t, becomes 
ns of “i oe cosh ddd 
~ Jw {(1+2*) cosh ¢ + 2zsinh $} cosh? d 


+> ~— sech? ddd 
—o 1+22+2ztanh ¢ 
1 +o 
= E log [(1 +2?) + 2z tanh #1 
2z Lite 


— og ht Fi tan 4)? 
~ 22 ore — Fi tan 6)? 


2 
= —-tanh-!(F? tan 6). 
z 





©” 2sec? 0 
Hence tes. 5 i) ee tanh-? (Ft tan 0) (1+ F)-2dF. 
9 mtand 


Integrating by parts we find that éH,/00 = 1. Thus EZ, = sin-!p+C, where C is a constant. When 


p = 0 we have 
+1 gin-l 
z, = | oe (l—12)-tdr 
7 





we [(sin-? r)2]+1 — hs sin-!r(1—r?)-*dr = 0 
7 ae P 


Thus C = 0 and, in general, H, = sin-p. 7 
In a previous paper (1956) we deduced the following recurrence relationship: 


ation 1 a . p : ; 
(i=p))i ap &@,-,(sin“ r) = ap) (n— 2) [&,_,(sin r) — &,,,;(sin— r)]. (4) 
Ifn—1 = 3, we have just seen that 
&,-1(sin-r) = &,(sin-1r) = sin— p. 
Consequently the left-hand side of equation (4) is zero and it follows that 
&,(sin-'r) = &(sin-1r) = sin-!p. 
Substituting (x —1) = 5,7,9,... in (4) in turn, we see that 


sin-!p = &@,(sin-'r) = &,(sin-!r) = &(sin-!r) = .... 





(3) 


We have shown previously that 
sin-!p = &,(sin-'r) = &,(sin-!r) = &,(sin—r) = .... 
Hence &€,(sin—r) = sin-p for n>3. 


REFERENCE 


Hartey, B. I. (1956). Biometrika, 43, 219. 
Hartey, B. I. (1957). Biometrika, 44, 219. 


ve can 


Heterogeneity of error variances in a randomized block design 


By FRANKLIN A. GRAYBILL anp JOHN LEROY FOLKS 
Oklahoma Agricultural and Mechanical College 


1. INTRODUCTION 


In arandomized block experiment we frequently wish to test the hypoth»sis that all the treatment means 
are equal. When we have heterogeneity of error variances, however, the ratio of the treatment mean 
square to the error mean square is not distributed as Snedecor’s F’. For example, consider a randomized 


18-2 








276 Miscellanea 


block design with p treatments occurring on each of r blocks. If each of the first n, treatments have 
variance oj and each of the next , treatments have veriance 03, etc., and if 


9g 
am =) 
t= 
the mathematical model may be written as 
Vigg = Atty toptesy, (4 = 1,2,...,.937 = 1,2,....3 k = 1,2,...,7), (1-1) 


where the e;;;,’s are assumed to be normally distributed such that 
&(e;;,) = 0 for alli,j, and k; &(e%j,) = o7 for all j and k 
and E(€ijx€smn) =O unless 1=8,7 =mandk=n. 
Graybill (1954) has given an exact test, based on Hotelling’s T?, for testing the hypothesis, 
Hyg: thy = the = --. = bin, = bg = -.. = bn, 
This test is valid if r>p— 1 and requires inverting a matrix of order p— 1. 
The purpose of this paper is to give an exact test of Hy which is simpler than that given by Graybill 
when n;> | for at least one i. Ifn, = n, = ... = n, = 1, then the test now proposed reduces to that given 
by Graybill. The latter is valid when r > p — 1 regardless of the values of n,;, but the new method is valid 


if r>g—1 and requires the inversion of a matrix of order g—1. Therefore, if n;>1 for at least one i, 
the method proposed in this paper is considerably simpler than the earlier method. 


2. THE TEST CRITERION 


Using the ith subset of observations* let us conduct an analysis of variance as in Table 1 for each subset 
for which n;> 1. 


Table 1. Analysis of variance for i-th subset 











| | 
peer we bad seers Sum of squares | 
variation freedom 
meuee! Were anita Giocbeniauatoakt pahacpabcleat beak ae 
Blocks r-1 | m4 D(Yi.e—Ye..)* = Ay 
| k 
Treatments n,—1 rd (yu.-—4:..)? = B; 
j 
Error (r—1)(n;—1) LV — Yin Ys. +H...) = Ci 
| i | 
| 
a C; Duk, He 
The ratio il oe = F; (where Y,;;, indicates Y;;, summed over k and y;;_ indicates the 
uy = “Mag 
average of Y;;, when summed over k, etc.) is distributed as Snedecor’s F' with (n;— 1) and (r— 1) (n;—1) 
degrees of freedom if and only if the hypothesis H;: (t;, = tig = ... = tin,) is true. 


If we let g— 1 represent the exact number of subsets of Y;;; in which n,;> 1, then we will run an analysis 
of variance as in Table 1 for each of these subsets. Also since Y ;;, is independent of Y,,,,, if s +7 (regardless 
of the values of j7, k, m and n), these g— 1 analyses of variance will yield g—1 independent F’; each dis- 
tributed as F with the anpropriate degrees of freedom if H; (i = 1, 2,...,qg—1) is true. For definiteness let 
us assume that the first g— 1 subsets each have n;> 1. That is to say, suppose n,>1,n.>1,...,N¢4>1. 
This is no way affects the generality of the discussion and the result is that if the hypotheses 

Ay: (ty = tye = --- =bin,)s Heat (tor = tog = «+» = ban.)s 3 Agat (aa Hhae=-- = tq-1,ng—,) 
are each true, then F’,, fF, ..., /,_, are independently distributed as F with degrees of freedom 
(ny—1), (r—1)(m—1);  (Me—1), (T—1)(Me—1)5— «+3 (Mg — 1), (7-1) (2-1-1) 
respectively. 


* By the ith subset of observations we will mean all observations where tie first subscript is equal to i. 


For example the second subset of observations consists of Y,;, where j = 1, 2,...,n2; k = 1,2,...,7. 





If: 


an 


is ( 


Bu 
Fr 
Gr 


have 


(1-1) 


yy bill 
ziven 
valid 
me 7, 


ubset 


al toi. 





Miscellanea 277 
If we average the model (1-1) over the subscript 7 we get 


Sy Lue Vise _ 
j=1 % 


mM 
If we denote >) Y,;,/n; by B,, and let T; = ran and d;, = ae then we can write (2-1) as 
j=1 j j 


ut 2 vet by + > (21) 


By = B+ T+. +d. (2-2) 
From the assumptions in (1-1) 


é(d,;) = é> Cie/tg = 9, F(dzj) = oi/ni, 
F 
6(d,;d;,) = 0 for j+s8; &(d,;d,;) = 0 for i+v. 


Thus (2-2) is the model considered by Graybill (1954) and we can use Hotelling’s T? to test the hypothesis 
T,=T,=... = T,. The procedure is to define the g x 1 column vector X; by 


By; bi B,; 
xX;= By; — B,; 
Byn,5 toa Bai 
and define the g x 1 vector X = x X;,/r. Then if H, is true, the quantity 
j= 


pat +0) 





xX’ (e (X,;—X)(X,;—-X)}"X 


is distributed as F with r—1 and r—g+1 , of freedom (if r>g—1). 
We will now show that F,, F,,...,F, are jointly independently distributed as F if H, is true. 
Let (yi3.—Y:..) = Uses ANA (Yi3~n—-Yi.k—-Yus. + Yi..) = Vigxe Since Fy, Fy,...,F 4, are each functions of 
uj, and v;;,, and since F’, og a function of B,,,, and since u;;, ¥;;; and B,,,, are all normally distributed, it 
follows that F, F:,..., Ff 1, F, are jointly independent if cov (By,», uj) = COV (Bn, Vizx) = 9 for all 
i,j,k, mand n. 
It is obvious that cov (B,,n,u;;) = 0 for m+7. Let us consider the case when m = 7: 


Cov (Bin; Uiz) = F(€:.n) (€i3.— %..) 


Also it is immediately clear that coy (By, Vi;x) = 0, for m+7. Let us consider the case when m = 7: 


Cov (Bin; Visz) = F(€i.n) (Cire — a5. — Ct. e+...) 


= it Sut exits aye 
Ny; i 


Ca ne, ee re 


Nn IN Ny TH 


We have, therefore, that the F; (i = 1, 2,...,q) are jointly independent and each is distributed as F if, 
and only if, Hy is true; i.e. if, and only if, t); = ty, = ... = tin, = te1 = --- = tyn,- We can test H,, therefore, 
by combining qg independent tests of significance. 

Ina recent paper Birnbaum (1954) has summarized various methods of combining independent tests 
of significance including Fisher’s test (Fisher, 1946, § 21-1). If Fisher’s test is used, then the procedure 
for testing H, is to run the g— 1 analyses of variance (for which n; > 1) and tabulate the significance level 
of each F'; value; call these significance levels P,, P,,...,P,_,. If r>g—1 then F, can be obtained and 
the significance level P, tabulated (if r<g—1 the test proposed in this paper breaks down). If Hy is 
true then q 
P=-— > 2log,P; 

i=1 


is distributed as a y? variate with 2q degrees of freedom. 


REFERENCES 


BrrnBaum, A. (1954). J. Amer. Statist. Ass. 49, 559. 
Fisoer, R. A. (1946). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd. 
GRayYBILL, F. A. (1954). Biometrics, 10, 516. 








278 Miscellanea 


An extension property of a class of balanced incomplete block designs 


By G. P. SILLITTO 
Imperial Chemical Industries Ltd, Metals Division 


Let a balanced incomplete block design which enables v varieties or treatments to be compared in 6 
blocks (each block accommodating k varieties, each variety occurring once in each of r blocks, and each 
pair of varieties occurring together in A of the blocks) be represented by a matrix having v rows and 6} 
columns, in which the element in the ith row and jth column is + 1 if the ith variety is tested in the jth 
block, and — | if this variety is not tested in the jth block. Then it is not difficult to show that the rows 
of the matrix are orthogonal to each other if, and only if, 


b= 4(r—A). (1) 

This note is concerned with balanced incomplete block designs for which (1) holds, and in particular 
with pointing out a ‘law of composition’ which enables a third design to be obtained by a simple process 
from any two such designs. 

Using the known relations for any balanced incomplete block design 

rv = bk, (2) 
A=r(k—1)/(v—1), (3) 
a little algebra shows that if (1) is to be true, then necessarily 
2k = vtqv, (4) 
so that (1) can only hold if v is the square of an integer. If (1) is true, the two values for k given by (4) 
are the values for a design and its complement. However, (4) serves to determine only the ratios b/r 
and A/r, through (2) and (3) respectively. Hence what may be called the ‘simple multiples’ of a design 
(v,b,k,r, A), i.e. the spectrum of designs (v, nb, k, nr, nA) in which n is an integer or any fraction which 
makes nb, nr and nA all integral, are covered by the same conditions as have been discussed. The value 
of this observation arises from the fact that when a design allowed by (2) and (3) does not exist, or is 
difficult to find, some of its simple multiples often exist and can easily be found. 

Designs conforming to (1), (2) and (3) are of the form 

v=u*, b=2uN, k= 4fu(utl), r=(utl)N, A=}(ut2)N, (5) 
where u is an integer >1; N is an integer > }u such that b, r and A are integers; and designs with the 
parameters corresponding to + signs represent the complements of the designs with parameters corre- 
sponding to — signs. From any of these designs which exist and can be found, other designs can be 
obtained by using the law of composition given in the next paragraph. 

From any pair of known designs for which (1) is true there can be found a third design, by the following 
process. Call the matrices of the two known designs M, and M,; then the direct product M, x M, of 
these two matrices is obtained by substituting for each element +1 of M, the matrix M,, and for each 
element —1 of M, the matrix —M,. Then the matrix M, = M, x M, has v; = v,v, rows and b, = 6,6, 
columns, and all its elements are + 1 or — 1. A little consideration shows that all the rows of M, contain 
the same number 7, = 7,7.+(b,—7,)(b2—72) of +1 elements, and all the columns contain the same 
number k, = k,k,+(v,—k,)(ve—k,) of +1 elements. Further consideration of the different kinds of 
row which occur in M, shows that in any pair of rows, the number cf columns in which both rows have 
+1 elements is A; = 67,7,—87r,A,—87r,A,+12A,A,. Hence M, is the matrix of a balanced incomplete 
block design with parameters v,, 5, ks, 73, Ag. 

It is easily found also that M, conserves the property of row-orthogonality, and hence it in turn can 
be used as a factor in forming the direct product matrix representing yet another design; and so on. 
Clearly also in the direct product M, x M,, M, can be identical with, or the complement of, M,, if desired. 
Therefore, if one or more of the designs in the series (5) are known, an infinite number of balanced 
incomplete block designs can be constructed. 

The first members of the series (5) are 


u N v b k r A Notes 

3 1 4 4 3 3 2 Known, unreduced design 

3 2 9 12 3 4 1 Known, see Fisher & Yates (1953) 
4 2 16 16 6 6 2 Known, see below 

5 4 25 40 10 16 6 Not known, but see below 

6 3 36 36 15 15 6 Known, see below 





Bo: 


Fis 


in b 
each 
ind b 
e jth 
rows 


(1) 
cular 
ocess 


(2) 
(3) 


(4) 

ry (4) 
s b/r 
esign 
vhich 
value 
or is 


(5) 
h the 
20rre- 
an be 


wing 
M, of 
> each 
= b,b, 
mntain 
same 
ids of 
; have 
nplete 


n can 
sO on. 
sired. 
anced 





Miscellanea 279 


A design with u = 4 and N = 2 is well known, being given by Fisher & Yates (1953). Such a design 
can also be obtained by composition of the design for uw = 2, N = 1 with its own complement by the 
direct-product-matrix method. 

No design corresponding to u = 5, N = 4 appears to be known, but the 3 simple multiple of it, i.e. a 
design with u = 5, N = 6, v = 25, b = 60, k = 10, r = 24, A = Y, is obtained (using the terminology of 
Bose, 1939) by considering the modul of residue classes (mod 5) and letting five varieties u,, Ug, Us, U4, Us 
correspond to any element wu of the modul. Then the design is obtained by adding the elements of the 
modul to the following twelve initial blocks, keeping the suffixes invariant: 


(0,, 11, 21, 33, 3a, 4, 23, 14, 05, 45) (01, 11, 21, 31, 3g, 25, 45, 14, 44, 05) 
(0,, 4;, 02, le, 29, 32, 33, 43, 24, 15) (0, Oo, 14, 22, 32, 3g, 24, 44, 15, 45) 
(1,1, 02, 4, 03, 1s, 25, 33, 34, 44, 25) (1,, 41, 02, 03, 13, 25, 33, 34, 25, 45) 
(21, 12, 03, 43, Og 14, 24, 34, 35, 45) (21, 41, La, 42 Og, Og, Ly, 245 34, 35) 
(31, 41, 2a, 15, O4, 44, 05, 15, 25, 35) (31, 29, 42, 15, 45, 04, 05, 15, 25, 35) 
(01, 11, Og, 1g, Og, 1g, Og, 1a, 95, 15) (01, 21, Og, 22, Og, 25, O4, 24, 05, 25) 


No design corresponding to vu = 6, N = 3 appears to have been published, but one can be obtained by 
considering the modul of residue classes (mod 3, 3) and letting four varieties u,, U2, Us, Us correspond 
to every element u of the modul. Then the design is obtained by adding the elements of the modul to the 
following four initial blocks, keeping the suffixes invariant: 


(00,, 02,, 10,, 12,, 20,, 22,, O1,, 125, 20,, 005, 01, 025, 004, 12,, 214) 
(00,, 12,, 21,, 00,, 025, 10,, 125, 20,, 22,, Oly, 125, 205, 004, 01,, 02,) 
(00,, 01,, 02,, 00,, 12,, 215, 003, 025, 105, 125, 205, 225, O1,, 124, 20,) 
(01,, 12,, 20,, 00,, 015, 02,, 005, 125, 215, 004, 02,, 104, 124, 20,, 22,) 


The author acknowledges, with thanks, suggestions made by the referee which have been incorporated 
in this note. 


REFERENCES 
Boss, R. C. (1939). On the construction of balanced incomplete block designs. Ann. Eugen., Lond., 
9, 353-99. 


Fisoer, R. A. & YATES, F. (1953). Statistical Tables for Biological, Agricultural and Medical Research, 
4th ed. Edinburgh: Oliver and Boyd. 


Sequentially determined confidence intervals 


By N. L. JOHNSON 
University College, London 


1. A number of authors (see Anscombe (1953) for a list of references, and in particular Stein & Wald 
(1947) and Wolfowitz (1950)) have shown that in many special cases the sequential method of approach 
does not lead to any marked advantage in problems of estimation, as compared with standard fixed 
sample procedures. It is the purpose of this note to demonstrate, in a simple manner, that this is generally 
the case under moderately simple conditions. 

Suppose that a sequential procedure leading to the construction of a confidence interval for a para- 
meter 0 has the following properties: 

(i) the probability that it terminates after n stages is P,,, 

(ii) the conditional probability of obtaining a correct confidence interval statement, given that the 
sample terminates after n stages, is ,, and w,, is independent of 0. 

We define an optimum procedure as a procedure such that 


ao 
(a) the overall confidence coefficient }) w,P, is equal to Q, 
n=1 


«a 
(b) subject to (a) the average sample number }) nP,, is a minimum. 


n=1 ’ 
The two theorems which follow show that under fairly gencral conditions effectively optimum pro- 
cedures need have at most two non-zero P,,’s, and that, if certain further conditions are satisfied, either 
(a) the two non-zero P’s are P,, and P,,,, where @, < Q<w,,,, are the nearest values of the w’s to Q, or 
(b) if there is an w,, = Q, then the procedure with P,, = 1 is optimum. This means that optimum 
procedures, as defined above, are either fixed sample procedures, or close approximations thereto. 








280 Miscellanea 


The problem, and its solution, is similar to problems arising in the theory of linear programming, in 
particular those described by Beale (1955). 


2. Tuerorem 1. If P,, P2, Ps, ... satisfy the conditions 
(i) P,29, 
ao 
@) 2. = i, 
n=1 
co 
(iii) SY} o,P,=Q2 (w,>0, Q>0) 
n=1 
and by, bo, bs, ... are finite real numbers then there is a set {P,,} containing not more than two non-zero values, 
for which either 
co 
(a) B= D> b,P, takes its minimum possible value, or 


n= 
(6) B takes a value exceeding its lower bound by an arbitrarily small amount. 
Proof. We first prove that if B attains its minimum value for a set with a finite number of non-zero 
P’s this number need not be greater than two. 
Suppose that a finite number N (> 3) of non-zero P’s, P,,, P,,,---, Pay Say, Satisfy conditions (i)-(iii) 


of the theorem. Then values of the ratios 0,,:4,,:...:,, can be found satisfying the conditions 
N N 
> 6,, = D a;F2,; = 9. (1) 
i=1 i=1 


If we define 6; = 0 for all 7 not equal to any «;, then the set {P,,+4,} will satisfy conditions (i)—(iii) 
provided min (P,,+6,,)>0. By a suitable choice of signs of the é’s we can ensure that 


N N 
x by (Pa, +54,) < X basP ae 
f= t= 


Hence, increasing the numerical values of the 6’s until min (P,,+4,,) = 0, we obtain a fresh set of P’s 
satisfying (i)—(iii) with N’ (<.N) non-zero values Poo Pp -++ Pay. and 


N’ N 
>> b5,P5,< p> ba; Pay: 
i=] t=1 


This argument can be repeated as long as N>3. Since we start with a finite number of non-zero P’s, 
the case N = 3 will eventually be reached, and a further application of the argument will reduce the 
number of non-zero P’s to two. 

Hence if B has a minimum value attained with a finite number of non-zero P’s this number need not 
be greater than two. 

Now consider sets of values {P,,} containing an infinity of non-zero P’s. Since the P’s must satisfy 
conditions (i) and (ii) the value of B for each set must be the limit of the values of this sum for a sequence 
of sets with finite (though increasing) numbers of non-zero P’s. 

The value of B for each of these sets cannot be less than the value for some set with at most two 
non-zero P’s. Hence it is always possible to find a set {P,,} with at most two non-zero P’s for which B 
exceeds its lower bound by an arbitrarily small amount. 

Corollary. If, among the values of B for all sets {P,,} with not more than two non-zero P’s there is a 
minimum value, then this is the minimum value of B for all sets of {P,,} satisfying (i)-—(iii). 


3. THrorem 2. Jf (i) b, and w, are each increasing functions of n and (ii) the curve with parametric 

equation (bn, 0) is convex from above, i.e. if n<n’ <n", then 
(b,— by) On > (by — by) Ont (by — by) Ons (2) 

then there is a set {P,,} satisfying conditions (i)-(iii) of Theorem 1 which minimizes B, and this is the set 
containing either 

(a) only the two non-zero values Py», Pm, where m is defined by Om <Q <<Om44, OF 

(b) the single non-zero value P, = 1 if there isan w, = 2. 

Proof. This theorem is suggested by intuitive geometrical considerations, but the proof given below 
is perhaps more satisfying. 

(i) Assume that there is no w,, equal to Q, but that w,, <Q <n 44.- 

Consider the set {P,,} containing just two non-zero values P,, P, with « < £. These values must satisfy 
the equations P,+P,=1, 


(3) 
oP, +ugP,= Q. 











and 


ro 


ric 


Ow 


(3) 





a 





Miscellanea 281 








-Q Q- 
Hence P,= a » Ps= Oa 
Wp — Wy, Op —W, 
1 
and 6b, Py +bgP2 = —— [b,(Wg— Q) +6,(Q—,)). 
Og- Wy 
Evidently we must have Wy <QX<wg, 


i.e. a<m<m+1<f. 
If 2>m-+1 we can replace f by f’ with 
p>f’>m+l. 


The new values P,, P%- must satisfy equations analogous to (3). Hence 


1 
bg Py tbyPp. = ——— [bal — 2) +5g(Q—0,)] 
a 


Q-w 
cern Paar a i dle i ii diate 
<0, 


and b Pi, + by Py, —b, Py — bg Pp = 


by reason of the convexity condition (2), since £> fp’ >a. 

Hence B is decreased by replacing £ by £’</. Similarly, if «<m B is decreased by replacing « by 
a’ where a<a’<m. Hence the value of B is a minimum for the set {P,,} with the two non-zero values 
Pn» Pm, and the value of this minimum is 

——* [by (ss — 2) + Buys 2— Om) 
Omn+1—%m 

From the corollary to Theorem | this is also the minimum value of B for all sets {P,,} satisfying (i)—(iii) 
of Theorem 1. 

(ii) If o,, = Q a similar argument leads to the conclusion that the minimum value of B, among all 
sets {P,,} satisfying (i)—(iii) of Theorem 1 with just two non-zero values, is attained by the set with 
P,-1» Pm4, not equal to zero. The value of B for this set is 


1 
mn Sa Sue [Om—(Om+1 i Q) - bm4a(Q x Wm—1)]- (4) 
Omn+1— Om-1 
But the value of B for the set {P,,} with P,,, = 1 is b,,, which is less than (4) by the convexity property (2). 
Hence if w,, = Q, the set P,, = 1 minimizes the value of B. 


4. Putting b, = n in Theorem 2, it follows that if the curve (n, w,,) is convex from above (as defined 
by (2)) then the optimum sequential procedure for constructing confidence intervals with assigned 
confidence coefficient Q is a fixed sample procedure or a close approximation thereto. This covers a 
large number of cases occurring in statistical theory, including, for example, all cases where w,, is 
proportional to 1 


\ (27) 
Even if the curve (n,w,) is not convex from above, Theorem 1 shows that there will be effectively 
optimum procedures with only two sample sizes, although these may now differ by more than unity. 
The results obtained in this note are valid for any decision procedure (i.e. not only for the construction 
of confidence intervals) provided the w,,’s depend only on and not on the values of the population 
parameters or the procedure by which n is chosen. For this reason they do not apply, for example, to the 
construction of sequential tests comparing two hypotheses, nor would they apply to the construction 
of confidence intervals if w, were allowed to depend on the parameter @ being estimated (subject to the 
condition Lw, P, = Q independent of @). 


evn P 
| e-t”dt (see also Anscombe, 1949). 
0 


REFERENCES 


ANscoMBE, F. J. (1953). Sequential estimation. J. R. Statist. Soc. B, 15, 1-29. 

AnscoMBE, F. J. (1949). Large-sample theory of sequential estimation. Biometrika, 36, 455-8. 

Beate, E. M. L. (1955). On the minimization of convex functions subject to linear inequalities. J. R. 
Statist. Soc. B, 17, 173-84. 

Stem, C. M. & Wap, A. (1947). Sequential confidence intervals for the mean of a normal distribution 
with known variance. Ann. Math. Statist. 18, 427-33. 

Wotrowitz, J. (1950). Minimax estimates of the mean of a Normal distribution with known variance. 
Ann. Math. Statist. 21, 218-30. 








282 Miscellanea 


Estimation of means of normal populations from observed minima 


By H. A. DAVID 


University of Melbourne 


1. INTRODUCTION 


In life tests it is often advantageous to stop the experiment after some preassigned number of failures, 
say k, has been observed among m objects under test, and there are other situations where it is either 
impossible or impracticable to obtain more than the k smallest readings. When the characteristic mea- 
sured follows an exponential distribution the problems raised are fairly simple and have been studied in 
detail (see, for example, Epstein & Sobel, 1953). For a normal population, to which we confine con- 
sideration in this paper, Gupta (1952) has shown how maximum likelihood as well as linear estimators 
of the mean y and the standard deviation o may be found, provided k> 2. 

However, it may happen that only the smallest observation can ever be obtained, due to an inherent 
breakdown of the system after one failure. In such a case several experiments will evidently be needed 
for the estimation of 4 and o. An example, which has led to the present investigation, is the failure due 
to fatigue of an aeroplane wing which is subjected to repeated cycles of a load under which it eventually 
breaks, at random, in one of two positions, A or B. Here the problem is to estimate the breaking strengths, 
fa and pg, at_A and B (expressed as the logarithm of the number of cycles to failure) from values observed 
when n similar wings are tested; the n breaking points are, of course, also known. It will be seen that this 
example introduces an additional feature in that ~4, and ~z cannot generally be assumed equal. Before 
dealing with it we therefore consider the case when there are several possible breaking points which, 
for reasons of symmetry, may be taken to be of equal strength. This is identical with the problem of 
estimating ~ and o from n observations each of which is the minimum of m independent N(y, 0?) variates. 


2. THE SYMMETRICAL CASE 


Let x, (i = 1,2,...,) denote the smallest of m independent N(yu,o?) variates. Assuming the 2; to be 
independent we shall consider the estimation of 4 and a both by the method of maximum likelihood 
and the method of moments. 

The likelihood function L(y, o) is given by 


L( #) =H {a cexp pie. —p)? | a [ - aca —p)? ay)”'| 
ad ~ ina Wa) E 202”! B) cane P 2o2 4 B 7 5 


Let u;, = (x%;,—p)/o, 2(u;) = ca e~h4, 
Q(u) = | et)dt and A, = e(u,)/Q(u)- (1) 


Then log Z may be written 


l n n 
log L =C-—n logo —5 xX u?+(m—1) X log Q(u,), 
i=1 i=1 


where C is a constant. The maximum.-likelihood equations are 


o dlog L/éu = Xu;+(m—1)XA,; = 0 (2) 
and o dlog L/éo = —n+ Xu? +(m—1)DA;,u; = 0. (3) 


These equations may be solved numerically for ~ and o with the help of tables of the A; (K. Pearson, 
1931, Table II; also Editorial, 1955, Biometrika, 43, 217). 
However, for n large it is much easier to estimate ~ and o by the method of moments. We have 


E(u) = p—Obm, varxz, = 0 V_, 


where £,,, V, are respectively the mean and variance of the largest of m unit normal deviates. Unbiased 
moment estimators are therefore given by 


ot? = Y(x,—%)?/[(n—1) V,,] 


and p* = 2%+£,,0°*/c,, 


where 
while ; 
largest 


and 








Simi 





. |e BS SS oe a @ 


-™ 














Miscellanea 283 


where % = Xa,/n and c, = 1—}(f,—1)/n+...; c, is the factor making o* an unbiased estimator of 7 
while £, is the coefficient of kurtosis of the extreme. &,, and V,, as well as the coefficients £,, /, of the 
largest of m normal deviates have been tabulated by Ruben (1954) for m< 50. 

We now compare the large sample variances of ~* and o* with those of the maximum-likelihood 
estimators i and G. The latter are, of course, only asymptotically unbiased. From (2) and (3) it follows 
that 














2 log L m—1 
Bree i = ne is 2A ,(A;— Uj), 

n op? n 

celogL 2 -1 
one ie ae 2A (1+A;u,;—U?) 

n O"eo n n 
o*? log L 3 -—1 

oni ee a ~ 34 Sele ——— LALA + Sel. 

n @07 i n 


Table 1. Large — variances in terms of o?/n of maximum-likelihood and moment estimators 
(A o and pB*, o*) derived from n minima of m independent N(u, 0) variates 




















| 
m | var ft var p* | var o var o* 
| 
| ‘eae dats “| : ed 
1 | 1-000 | 1-000 | 0-500 0-500 
2 | 0-781 | o7s2 | 0-509 0-515 
se 0-796 | 0-806 0-514 0-532 
4 | 0-854 0-874 0-517 0-541 
5 | 0-922 | 0-956 | 0-520 0-550 
6 0-991 | 1-041 0-522 0-558 
7 | 1-057 | 1-124 0-523 0-566 
8 | 1-120 | 1-204 0-525 0-572 
ii 4 1-180 1-281 | 05526 0-578 
10 1-236 1-354 | 0-527 0-583 
12 | 1-340 1-491 | 0-529 0-592 
15 | 1-477 1-675 | 0-531 0-603 
20 | 1-668 1-937 | 0-533 0-617 
30 | 1-960 2-350 | 0-537 0-637 
40 | 2-183 2-672 | 0-539 6-651 
50 | 2-362 2-935 | 0-541 0-661 








To obtain expected values we require terms of the type 


6(u"A) and &(u"A?). 
Since the density function of w is 


fu) = ma(u) Quy 


'c 
we have é(u"A) =m | uz .2Q™—du 


—-@ 





m ~ 
= i (ru™—1z —u"+1z) OQ" 1 du 
m— —o 








= : [r&(ur-1) — &(u"*)]. 
m—1 


Similarly, by two integrations by parts, we find for m>2 
1 


6(u" A?) = (m— 1) (m—2) [r(r- 1) é(u"-*) - (37 + 2) &(u’) +26(ur+?)). 








284 Miscellanea 


Thus in all these cases the covariance matrix of g and 6 can be obtained in terms of the first four moments 
of the extreme. For m = 2 numerical integration is necessary for the evaluation of &(u’.A?). 
The variances for large n of ~*, o* can be obtained by standard methods. We find 


2 
var p* = ~ [Vin + 382,(B2—1)—EmV (Vm A1)I, 


Pe 
* = —(f,-1). 
var o ~ (f.—1) 
In Table | the large sample variances of /, u* and o,o* are compared for m = 2(1) 10, 12, 15, 20, 30, 40, 50. 


3. CASE OF TWO UNEQUAL MEANS 

In this section we are concerned with the estimation of the means , and 4, of two normal populations 
with common variance o? from a knowledge of the smaller of each of n pairs of observations x, and y, 
(t = 1,2,...,”). It will be assumed that all w’s and y’s are independent and in addition that the popula- 
tion from which any observed minimum comes can be identified. Suppose that k (<n) minima are 2’s 
and n—k are y’s. These observations may be regarded as the outcome of mutual stochastic censoring. 
With a slight change in notation we denote them by a; (¢ = 1, 2,...,k) and y; (j = 1,2,...,n—k). The 
logarithm of the likelihood function is 


log L = C—nlog ao — (Zu? + Sv?) + D log Q(u;) + Dlog Q(v5), 
where Uy = (X%j—fz)/O, V5 = (Y;—My)/C, 
U, = (%j—M,)/0, = (Ys—Ha)/O, 
and Q(u;) is defined as in (1). With A; = z(u;)/Q(u; )and B, = 2(v; )/Q(v; ) the likelihood equations may be 


written he ss z+ =B;, (4) 
‘ G 
= 7 — ZA; 
hy =9+~ =A, (5) 
1 A 7 = — A , , 
and n= x, Ela — 8)? + W(® — fa)? + Uys — 9)? + (n —k)(Y—fy)*]+ LA;,u, + TB;2;. (6) 
o 


For n not too large these equations may again be solved numerically. 
In this case the moment approach is not so simple as in the preceding section. We note that % and are 


estimators of &(a|a<y) and &(y|y<.«), and therefore evaluate these two expectations in terms of the 
three parameters. 


We have (x |a<y) =|" S(x,y) dy/Pr (x<y) 
= se) | f(y) dy/Q(&), 
where § = (4,—/y)/(/20). 1t may be observed that if ~, = “, then 
co 
fle|x<w) = 240) {” feddy 
x 


which is just the density function of the smaller of two identically distributed variates. 
Now 


Q(E) 6(e|x<y) = | af (x) | fy) dyde 


= | son | af(x)dxdy 


Se is ] Y-by (y-y"2)io | 
7a. ni yp a Tam) (et Made dzdy (z= (x—p,)/0) 

a l : e a : 7 : 
eee cal ec cai Galea) 


= ",Q()-— ee 3 (Ha My |: 





Wri 


and by 


Simila 
and 


Equat 
prefer 


which 


and 


The 


The ri 
but % 
may é 
be =i 
decide 


Th 
by m 
sidiar 
from 


Tai 
me th 
(1957 
of th 
Melb 





Miscellanea 285 
Writing A; for 2(£)/Q(£) we have 


o 
6(x|x<y) = He— 7S Ag, (7) 
N 
and by symmetry &(y|y<ax) = #y- A. (8) 
v 
Similarly, we find var (x |x<y) = o°(1+4€A-— 3A?) (9) 
and var (y|y<x) = o°%(1—3£A_,—}A?2_). (10) 


Equations (9) and (10) could be combined to provide a third estimating equation. However, it seems 
preferable and simpler to take instead k/n as an estimator of Pr(x<y), i.e. 


Q(E*) = kin, (11) 


which gives £* immediately and hence o*, *, 4* from 


2pm 
/2 

o , 

and Y = by - Ae (8’) 
{2 


There is one point worth noting here. From (7’) and (8’) we have an equation for o*, viz. 
/2(%—9)/o* = 2€* —(Az.— A_,.). (12) 


The right-hand side can be shown to have the same sign as £*. Therefore, if £* is positive (i.e. u* > *) 
but 7 is less than Y, then (12) will have no satisfactory solution; likewise if £* <0 but %>9. This situation 
may arise in small samples or if #, and yw, are not very different, and suggests that the simpler model 
[tz = [ty is adequate. Even without finding such a contradiction we may, especially in large samples, 
decide in favour of the simpler model if k/n does not differ significantly from 4. 


4. A NUMERICAL EXAMPLE 


Sixteen similar aeroplane wings were in turn subjected to repeated applications of the same load until 
breaking in one of two positions, A or B, five breaking at A and eleven at B. A and B were so far apart 
that the strength at A and B may be regarded as independent. Let x; and y; denote the observed breaking- 
strengths at A and B respectively. The following values (measured in logarithmic units from a suitable 
origin) were obtained: 


x;: 0°1959, 0-3856, 0-1647, 0-3655, 0-3410; 
y;: 0°2519, —0-0667, 0-2842, 0-0380, 0-5197, 0-2142, 0-2175, 0-3474, 0-3674, 0-0789, 0-1642. 


Equations (4)—-(6) give the maximum-likelihood estimates of w,, 4, and 0 as 
fi, = 0-421, fi, = 0-284 and o¢ = 0-166. 
The corresponding moment estimates are from (11), (12), (7’) and (8’) 
p* = 0-514, w* = 0-321 and o* = 0-279. 


The sample size n = 16 is in this case not really sufficiently large for the simple method of estimation 
by moments to be altogether satisfactory, especially for o. However, the estimation of o was of sub- 
sidiary interest as the purpose of the experiment was to compare the estimates of w, and #, obtained 
from tests on wings with the results of fatigue tests on different specimens. 


I am indebted to Mr D. G. Ford, of the Aeronautical Research Laboratories, Victoria, for referring to 
me the problem treated in §§ 3-4. Further details on applications are given in a report by Ford & Payne 
(1957). Dr E. J. Williams and Prof. E. S. Pearson made some valuable comments on an earlier version 
of this paper. Table 1 was computed by Miss B. Laby and Miss G. E. Jacobs of the University of 
Melbourne. 








286 . Miscellanea 


REFERENCES 


Epster, B. & Sospet, M. (1953). Life testing. J. Amer. Statist. Ass. 49, 486. 

Forp, D. G. & Payns, A. O. (1957). The fatigue characteristics of a riveted 24 S-T Wing. Part IV. 
Analysis of results. Aeronautical Research Laboratories (Victoria) Structures Report. 

Gupta, A. K. (1952). Estimation of the mean and standard deviation of a normal population from 
a censored sample. Biometrika, 39, 260. 

Pearson, K. (1931). Tables for Statisticians and Biometricians, Part 2. Cambridge University Press. 

RuBEN, H. (1954). On the moments of order statistics in samples from normal populations. Bio- 
metrika, 41, 200. e 


The power of the Poisson index of dispersion 


By J. H. DARWIN 
Department of Scientific and Industrial Research, New Zealand 


1. The distribution of the index of dispersion, Z = }) (x; —%)*/%, for a sample of independent variables 


Hy, +++, €,, drawn from a Poisson distribution with a high mean, is known to be approximately that of a 
x? with n — 1 degrees of freedom, say y2_,. It is the object of this note to find the limiting distribution 
of Z when the parent distribution of the x’s is not Poisson but has a high mean p, a variance of order yw 
and cumulants «, which are o(4") for r> 2. 

This problem, which arises in considering the power of the y2_, test of Z against the alternatives 
described above, has been tackled in part by Bateman (1950) and Kathirgamatamby (1953). Bateman 
discusses Neyman’s contagious distribution as an alternative. She finds the first four moments of Z 
and shows that for a large sample size or a large mean these moments tend in form to those of (1 +m.) x3_;, 
where 1+ mz, is the ratio (variance)/(mean) for Neyman’s distribution. Kathirgamatamby discusses 
the variable u = X(x;—%)?— (Za) y2_,,/n, where y2_,, is such that Pr(y2_,>7%-1,,) = %. He finds 
the first four cumulants of u and fits distributions to these when the alternatives to the Poisson are 
Thomas’s distribution, the negative binomial and Neyman’s contagious distribution of Type A. 

We shall prove that Bateman’s particular result holds for the class of alternatives shown above, and 
we shall find the accuracy of the resultant expression for the power of Z to the next order in 1/. 


2. Z may be written (X(«;—%)?/~)/(%/u). Suppose the distribution function of x; has cumulants 

K,(=/), Kg, .... Then the joint characteristic function of x;—% (j = 1,...,) is 
E exp [Xi0,(x;—Z)] = EH exp [i=a,(0;—9)}. 
The joint cumulant generating function of the (7;—%)/,/u is then 
piX(0;-0) Kyi? . K,i" - 

al +79 . (0;—9) ton t+ er (5-8) ae (1) 
Hence if «x, = o(y!") for r = 3,4,..., this tends to — x, =(0;—90)?/(2) the cumulant generating function 
of n normal variables with a variance matrix of rank n— 1. It follows easily by the use of an orthogonal 
transformation of the (x;—%)/,/u that the limiting distribution of X(2;—Z)*/k, is that of 73_,. Again, by 
Tchebycheff’s theorem, %/y tends in probability to 1 as 7 becomes large. Hence by a convergence theorem 
of Cramér’s (1946, p. 254) the limiting distribution of Z/x, for large yw is that of y3_. 

This derivation holds for example for Neyman’s contagious distribution of Type A, a negative 
binomial, and Thomas’s distribution when each of these has a large mean y and a ratio (variance) /(mean) 
which is 0(1) in pw. 

If an alternative to the Poisson has a compound form with probability generating function 
E, exp{[A(z—1)], where A has a distribution dF (A), then its cumulants «, are linear functions of those 
x, of di'(A). The condition that «, and x, are large and of the same order and x,/x}" tends to 0 as kK, 
becomes large for 7 = 3, 4,..., is implied by the same condition in the k.’s. This same condition in the 
«,’s is then sufficient for the limiting distribution of Z to be of the form found above. 





3. The discrete alternatives we are considering may in standard form be approximated to by an 
Edgeworth series for the sake of finding the joint sampling distribution of the mean and variance. 


n 
Gayen (1949), using a method of Bartlett’s (1935), found the joint distribution of S,= > y; and 
j=1 


n a 
S,= > (y;—7y)?. where the y’s are drawn independently from a universe with zero mean, unit standard 
j=1 








deviati 
From tl 
For s 


If the « 
varian¢ 


The po 
at the | 
Gayé« 


If now 


and int 


ie. w is 
to the 


which : 
distrib 


in pow 


If fo 


easy Cé 
Pr(Z> 








Miscellanea 287 


deviation and all cumulant terms of higher order than those in A? and A, neglected, where A, = x,/x}’. 
From this joint distribution it is easy to calculate Pr (Z> x2_, ,) to this order of approximation in the 1’s. 

For simplicity write y2_, , = X and set 

u = >} (a;—%)*-2X. 
j 
If the observations x; have been drawn from one of the alternative populations with mean x, = # and 
variance K, = a, then in terms of the standardized variables y and Gayen’s S, and S, 
u = S,ap—Xpu—XS,(ap)*/n. 

The power of the index of dispersion test based on Z, that is, the probability of establishing significance 
at the 100a % level when a> 1, is therefore Pr(u>0). 

Gayen’s first approximation to the joint distribution of S, and S, is 
e-Siin S}O-9) e-BS: 
S,S,.) = W(n-1) = —— ~ + —_., 2 
H8,8,) = Win— 1) = a eo aran a) (2) 
If now we change our variables from S, and S, to S, and w, where 
s XS, — ut+Xp 

nau) ap” 
and integrate out for S,, we find for the leading term of the distribution of w 

w(n-3) e-iw 


f(w) = 2D Tn —}) ’ 
i.e. w is approximately distributed as y? with n—1 degrees of freedom. Hence our first approximation 
Wis pees Pr(Z>X) = Pr(u>0) = Pr(w>X/a) = Pr(y3_,>X3-1,4/2), 
which is the result reached in § 2 for the term of zero order in z~1. There is no term of order y~* in the 
distribution of w; that in 4-1 has contributions from the expansion of 
XS, )i-» 
n»/(ap) 


in powers of z~-* and from the appropriate terms in Gayen’s coefficient of W(n— 1). 
If for convenience we write 


w= 








S}"-9 e-4Ss zs {w+ e-§S: XIV ap) 


F, = Pr (x2> 2 ,/a); 
easy calculations give for the power of the test based on Z to order n-1, and for n>5 
Pr(Z> x3-1,4) = Pr(Z>X) 
2 _ 
= 4 n-1 + = (Fra er 2F,,-s +F,-s) 7 Ms 
Az(n— 1) (n—2) 
12n 
Taking the significance level as « = 0-05, some calculations have been made from this formula for 
a = 3 and 2 for each of Thomas’s and Neyman’s distributions and for the negative binomial distribution. 
For these three distributions A, ,/u and A, are functions of a only. 


The A values for the alternative distributions were as shown in Table 1, while the values for the power 
are given in Table 2, together with the approximations derived by Kathirgamatamby. 





(Faia 2F 4 +f») 


Ay(n— 1) 


2 
sa 8n (Fass — 2F asia + Pas) + (Fis — 3Pis3 t+ 3F aya — n-1)° (3) 











Table 1 
Case = 1:5 | Case = 2-0 
Alternative = ae 
| 
AsV Aye AsV Agpe 

Thomas | 1-47075 2-5714 1-71618 | 3-4496 
Neyman | 1-49691 2-7222 | 1-76777 | 3-7500 
Negative binomial | 1-63299 3-6667 2-12132 6-5000 











288 


Miscellanea 





The differences, with respect to v, of the functions F,, were found by the use of the formulae 


4(F, 


F,(x) _ 


F(x) + ,_s(2) = 2 


-2(v) — F,(x)) = 


0x 


OF (x) 





OF,_2(x) 
Ox 


=[2@ Jeter 5+). 


> ed 


Ox 





Table 2. Approximations to the power of the index of dispersion test 
(The figures in parentheses are Kathirgamatamby’s values.) 


(4) 





























| 1 
| 
| a=1°5 | a= 2-0 
| | | | | | | | 
| Main | | | | Main | 
| p=1 5 10 term | p=!l1 | 5 10 term 
= Sei tar a 
n= 101 | | | | 
Thomas | 0-874 | 0-889 | 0-890 — 0-999 | 0-999 | 0-999 | 
| (0-870) | (0-888) | (0-890) | (0-999) | (1-000) J (0-999) | 
Neyman | 0-869 | 0-888 | 0-890 0-999 | 0-999 | ie 
| (0-865) | (0-887) | (0-890) | 0-892 | (0-999) | (0-999) | To “—— 
| Neg. binomial | 0-834 0-881 0-887 | 0-996 | 0-998 | 0-999 
| | (0-834) | (0-880) | (0-886) | | (0-988) | (0-997) | (0-999) 
= = SSeS eee: er eee Seas. ese 
: | Sw 
n= 51 | | | 
Thomas | 0-636 0-666 | 0-670 | 0-962 | 0-962 | 0-962 | 
| (0-635) | (0-667) | (0-669) | (0-956) | (0-960) | | 
Neyman 0-629 | 0-665 | 0-669 se A | fT . cietd 
| (0-665) | (0-666) | (0-669) | ovis | _ — ge | itn 
| Neg. binomial | 0-588 | 0-657 | 0-665 | 0-910 | 0-952 | 0-957 | 
| (0-606) | (0-659) | (0-666) | (0-898) | (0-949) we (0-955) | 
pcan MA aa cnn a 
| | ) | 
7S 20 | | | | | 
Thomas | 0-337 | 0-379 | 0-384 | 0-604 | 0-713 | 0-716 | 
Neyman | 0-333 0-378 | 0-383 0-389 0-682 | 0-711 0-714 |; 0-718 
Neg. binomial 0-310 | 0°373 | 0-381 0-559 0-686 0-702 | 
| 
ie Rt mo eee Rei St ole tt eagle eres 
| | | | | | | 
| n=10 | 
Thomas | 0-20) | 0-246 | 0-251 | 0-443 | 0-479 | 0-484 
Neyman | 0-198 | 0-245 | 0-251 0-257 | 0-431 | 0-477 | 0-483 |} 0-489 
Neg. binomial | 0-181 0-242 | 0-249 0-319 | 0-455 | 0-472 
ii x 7 eh ge eae ee 9 
| | 
| n=6 | 
Thomas 0-138 0-183 | 0-188 0-307 0-345 | 0-349 | 
Neyman 0-135 0-182 | 0-188 0-194 | 0-296 | 0-342 | 0-348 0-354 
Neg. binomial 0-118 0-179 0-186 0-199 | 0-323 | 0-339 








(4) 


34 








Miscellanea 289 


For all except 101 the differences were checked by the method of interpolation recommended in the 
introduction to Pearson & Hartley’s tables (1954, p. 13). For m = 101 the main term in (3) was found 
from the Wilson—Hilferty normal approximation. 


The table shows in its comparison of (3) with Kathirgamatamby’s results that the main term 
Pia = Pr (yi a> Ni-,a/); 


which depends only on a = k,/k, and not on the higher cumulants, is a good approximation to the power 
for most practical purposes when n = 51 and 101. It is less than 2-5 % higher than Kathirgamatamby’s 
figure even for 4 as low as 5. If more accuracy is required the second term gives an extremely good 
correction down to 4 = 5 and in most cases down to uw = 1. 


We have no independent calculation by which to check the accuracy of the correction when n = 6, 10 


and 20 and yz has the same values as before, although it is to be expected that it will not be as good as 
for n = 51 and 101. 


My thanks are due to Miss B. I. Harley for checking the main part of the calculations, which were in 
some places in need of correction. 


REFERENCES 


BartteEtt, M. 8. (1935). Proc. Camb. Phil. Soc. 31, 223. 

Bateman, G. I. (1950). Biometrika, 37, 59. 

Cramer, H. (1946). Mathematical Methods of Statistics. Princeton University Press. 

GaYEN, A. K. (1949). Biometrika, 36, 353. 

KATHIRGAMATAMBY, N. (1953). Biometrika, 40, 225. 

Pearson, E. 8. & Hartiey, H. O. (1954). Biometrika Tables for Statisticians. 1. 
Cambridge University Press. 


Some properties of the bivariate normal distribution 
considered in the form of a contingency table 


By H. 0. LANCASTER 
School of Public Health and Tropical Medicine, Sydney 


1. In considering the properties of contingency tables and bivariate frequency distributions Karl 
Pearson (1904) showed that if a bivariate normal distribution is classified in a two-way table the con- 
tingency and the correlation parameter are related by the expression 


$?=x?/N = p?/(1—p?). (1) 
It is assumed that N, the number of observations, is large and that the class intervals are very narrow 


—-the equation, in fact, represents a limiting property. For a p-way classification based on a multi- 
variate normal distribution of dimension p Pearson also showed that 


¢*? = 1/,/(RR’)—-1, (2) 


where R is the determinant of the correlation matrix R and R’ is the determinant of the matrix (2 1—R). 
It was on the basis of these results that Pearson proposed his coefficient of contingency 


o= (ita) sad 


In the limiting normal case this is equal to the correlation parameter p. More generally, it may be 
regarded as an invariative property of a distribution, being unchanged by any arbitrary, not necessarily 
linear transformation of the scale of either or both variates. 


2. In 1940 Fisher considered contingency tables from the point of view of discriminant analysis. 
Suppose that ‘scores’, i.e. arbitrary variate values, are assigned to the rows and also to the columns of 
a contingency table: what are the best scores to assign to the rows so that a linear function of them will 
best differentiate the classes determined by the columns, and vice versa? This turns out to be a problem 
in maximizing the correlation between the scores and the required correlations are those known as 


19 Biom, 44 








290 Miscellanea 


‘canonical’ in the sense of Hotelling (1936). The work was continued and developed by Maung (1941). 
In particular, Maung quotes a result by Fisher which gives the observed frequency in terms of the 
canonical correlations; in fact, if the frequency is a;; with marginal totals a; , a_; and total a__, and if the 
canonical correlations are R,, R,, ...,R,_,, we have 


a; a m—1 
ayy = — (1 + 2 (exusBa), (4) 
where x and y are the assigned scores corresponding to the given cell. 


3. In this note I derive some further theorems in this field and link together some hitherto dis- 
connected results. It will be shown that (3) is, in the limit, equivalent to the well-known Mehler identity 
or tetrachoric series: 


co 
(2m)-* (1 — p?)-* exp { — 4(x* — 2pay + y*)/(1—p?)} = (27)-* exp{—4(x* + y*)} (1 +d V(x) vine » (5) 
where y,(x) is the Hermite~Tchebycheff polynomial of order 7. 


A MAXIMAL PROPERTY OF THE BIVARIATE NORMAL DISTRIBUTION 
4. We may consider the variables standardized so as to have unit variance and take 0<p< 1. 


Tuerorem. Let x and y be jointly distributed in the bivariate normal distribution with correlation p. If 
now a transformation, x’ = a’(x), y’ = y’(y), is made to any new variables x’ and y’ such that 


” oo 
(2m) | x? exp—ta?dx and (2m)-4 | y’2 exp — 4y2dy 
= ane 


are finite, then the correlation of the new variables is less in absolute value than p. That is, p is the maximum 
canonical correlation. 

Under the conditions of the theorem, the new variables may be expressed in a series of standardized 
Hermite—Tchebycheff polynomials, 


aw’ = Agta, (x) +a, Yo(x) +... (6) 
such that (2m)-1 |" {(2 a0 Zaiya)) exp (—424 dr} 
—o 1 


wo 
is arbitrarily small and >) a? convergent. Moreover, the correlation is unaffected by either a linear change 
1 


of scale or a change of origin, so that without loss of generality we can write, 
(7) 


where the y,(x) obey the orthogonality relations 


(2m)-4 { Wilx) (x) exp — tarde = 4,;. (8) 


By a consideration of the expectation of exp (ta — tz? + uy — 4u*), we find 


. fr Vila) Wy) f(x, y) dady = 6,,p', (9) 


where f(z, y) is the density function of the bivariate normal population. 
The variance of a’ and y’ are easily found to be unity by the orthogonal properties of the y;. We have 
then re 
corr (x’, y’) = a,b; p*, (10) 
1 


and this is less than |p| unless a, = 6, = 1. | p| is therefore the maximum canonical correlation under 
a very general class of transformations. This was proved by Maung (1941) by an alternative method. 





which 


eeu 
Tcheb 
this ps 


Evi 
and R 
contin 


where 
direct 
the sec 
remair 
the set 
differe 
that 


f 


6) 


ge 


(8) 





Miscellanea 291 


5. THrorem. The values to be assigned to the canonical variables corresponding to the canonical correla- 
tions in descending order are yy,(x) and w,(y). The canonical correlations are the powers of p. 

We have already shown that the maximum canonical correlation corresponds with y,(x) = « and 
y,(y) = y. Let us take a second set of values x” and y”, such that 


E(x”) = E(y”")=90 and E(xx”)= E(yy”")=0 and E(x’)? =1= Ey’). 





i2) 
We may write vw” = de, | 
1 
“ (11) 
y= Ldydy) 
o co 
The conditions of the theorem again enable us to set >) c? and >) d? equal to unity. 
i=1 =] 
Further E(xx”) = 0 forces c, to be zero and similarly d, is zero. 
2) 
Now corr (x”,y”) = Yie;d;p', (12) 
2 


and this is maximal in absolute value only if c, = d, = 1 and all other c; = d; = 0. 
This process can be extended by induction and proves the theorem. 


Coro.Luary 1. If a choice of variables can be made so that a joint bivariate normal distribution results 
from a contingency table, then the canonical correlations are powers of the greatest of them and the sets 
of canonical variables are the standardized Hermite—Tchebycheff polynomials. 


CoroLLary 2. The roots of the determinantal equation usually solved in the treatment of this problem 
would be powers of the greatest of them but for disturbances due to sampling errors and difficulties 
caused by grouping. 


CoRoLLaARy 3. In the normal case the concept of minimum correlation has no validity. The lowest of 
the non-zero correlations will be p™-! approximately and deviations from this value will be due again 
to sampling and difficulties caused by grouping. 


FISHER’S IDENTITY AND MEHLER’S IDENTITY 


6. Consider the identity (5) in the limiting case when the contingency table with elements a,,/a__ 
becomes the frequency function f(x, y) dady. We have 


f(x,y) dady = 1+ >» Ya) wily) a 27) exp{ — }(x? + y*)} dxdy, (5 bis) 


which is the Mehler series. 


THE RELATIONSHIP BETWEEN ig AND CORRELATION 


7. In reconsidering the partition of y* in a contingency table Lancaster (1953) used the Hermite— 
Tcehebycheff polynomials to avoid the infinitely fine subdivision used by Pearson. In the notation of 
this paper he derived 

X2/a,, = R2+R2+...+R2_, 


~ p?+pt+...+p-* = p(1—p*™-*)/(1—p?). (13) 
Evidently x*?/a,,> R?, (14) 


and R, is greater than any other observed correlation. A more general result is that in a complex 
conti 
ntingency table Hla..> Dirty (15) 
< 


where r;; are the observed correlations between any variables. For if we partition y? making use of the 
direct product of matrices (Lancaster, 1951) we may take the first row of each to be of the form ,/w,; and 
the second row 2, ,/w,, where x is suitably normalized, the second row being orthogonal to the first. The 
remaining rows may be filled in with due regard to the orthogonal conditions. In the observed case, 
the set of canonical variables for x, when the correlation between x and y is being maximized, may be 
different from the set when the correlation between x and z is being maximized. So that we cannot assert 
that y*/a__ is greater than the sum of the squares of the canonical correlations. 


19-2 








292 Miscellanea 


We can obtain a result analogous to (2): 
log(1+¢?) = —}log(RR’) (Pearson notation) 
= —}$log|1+P|—}log|1—-P|, (16) 


where P is the matrix with elements, p;; = 7;; for i+j, and p;; = 0. If 6?< 1, we may expand both sides 
using the expansion of Durbin & Watson (1950) on the right. 


ee G?—49'+ 49%... = FtrP?+4trP*..., 
and so @>jtrP*= > 7. (17) 
i<j 
Even for ¢?> 1 this still holds, for 
1+¢? = exp{}trP?+}trP*...} 
> 1+4tr P?, 
o> >X 7%, (18) 
i<j 
There are various determinantal conditions on the r,;; derived by Pearson (1904) but it appears 
that they can all be summarized by saying that (1+ P) and (1—P) of (16) are both positive definite. 


Pearson (1904) noted that the first is necessary for the system to be a correlation system and the second 
condition is necessary for ¢* to have a finite limit. 


Discussion 


8. This paper relates some hitherto disconnected results. A maximal property of the bivariate dis- 
tribution is proved. Further, in the theoretical case, we have derived the interesting result that the sets 
of canonical variables are the standardized Hermite-Tchebycheff polynomials and the canonical 
correlations the corresponding power of | p|. This result raises a doubt as to whether the common pro- 
cedure of considering the roots of the determinantal equation separately can be justified if the hypothesis 
specifies a bivariate normal population. It would appear that once the first set of canonical variables are 
obtained, little further is gained by examining other sets since they differ from certain polynomial 
functions of the first set of canonical variables only because of sampling difficulty. Furthermore, a 
consideration of the hypothesis of the existence of one canonical correlation as in Williams (1952) shows 
that a value of the canonical correlation gives a limit to the range of the canonical variables; for the 


modified hypothesis is Sy ig. A toed (19) 


and p;;>0, so that xjyyi> —p (20) 


for any pair of y; and #;. But both x and y must take some positive values and some negative values under 
the maximization procedure, so that (20) can only hold if neither x nor y have an infinite range. It appears 
that these difficulties have not been thoroughly explored as yet. 


This paper is published with the permission of the Director-General of Health, Canberra. 


REFERENCES 


Dursin, J. & Watson, G. 8. (1950). Testing for serial correlation in least squares regression. I. 
Biometrika, 37, 409. 

Fisuer, R. A. (1940). The precision of discriminant functions. Ann. Eugen., Lond., 10, 422. 

FisHer, R. A. (1941). Cited by Maung (1941). 

Hore xine, H. (1936). Relations between two sets of variables. Biometrika, 28, 321. 

Lancaster, H. O. (1951). Complex contingency tables treated by the partition of x*. J. R. Statist. 
Soc. B, 13, 242. 

Lancaster, H. O. (1953). A reconciliation of y?. Sankhya, 13, 1. 

Maung, K. (1941). Measurement of association in a contingency table with special reference to the 
pigmentation of hair and eye colours of Scottish school children. Ann. Eugen., Lond., 11, 189. 

PEARSON, K. (1904). Mathematical contributions to the theory of evolution. XIII. On the theory of 
contingency and its relation to association and normal correlation. Drap. Co. Mem. Biom. Ser. 
no. l. 


WituiaMs, E. J. (1952). Use of scores for the analysis of association in contingency tables. Biometrika, 
39, 274. 








Einfi 
\ 


In thi 
explici 
Englis 
Stieltj 
all thi 
take a 
with t 
of cha 
advan 
chapte 
by pre 
theref 
gives | 
test fu 
Wilks 
linear 
and tl 
nobis’ 
tests’ 
ductic 
Wilco 
credit 
briefly 
precec 
index- 
Crit 
to rea 
contin 
do no 
be at 
and di 
distril 
the co 
could 


Stati 
( 


This ] 
views 
Fisher 
and t! 
repres 
alreac 

The 
the es 
will ex 
useful 
is alsc 
of Sta 


16) 


17) 


18) 


ars 
‘ite, 
ond 








[ 293 ] 


REVIEWS 


Einfihrung in die mathematische Statistik. By L. Scumerrerer. Vienna: Springer- 
Verlag. 1956. Pp. xxiii+ 405. £4. 3s. 6d. 


In this book it is evident that mathematical rigour has had first call (indeed, the publishers say 
explicitly that it is primarily for mathematicians) and a level of mathematics corresponding to the 
English B.Se. General is assumed in the reader, augmented by knowledge of the elements of the 
Stieltjes integral, elementary set theory, Borel measure of sets and the Lebesgue integral. But, for 
all this, the mathematics has been kept to a minimum, and the general reader who is prepared to 
take a few theorems on trust will find this an ‘eminently readable book. Thus the first chapter starts 
with the Kolmogorovian axiomatics, developing the set theory as needed, and follows with the theory 
of characteristic functions omitting the proofs of the uniqueness and limiting theorems. The dis- 
advantages of the mathematical approach have not been entirely avoided however, and the second 
chapter treating distribution theory (together with the associated tests at an intuitive level) begins 
by proving Cochran’s theorem, deducing, for instance, the distribution of the normal sample variance 
therefrom. The third chapter gives a straightforward account of confidence intervals and the fourth 
gives a rigorous but concise treatment of estimation theory. The fifth chapter giving the theory of 
test functions has recently been recommended by Neyman in J. R. Statist. Soc. (It is a pity here that 
Wilks has not been given credit for his theorem on likelihood ratio tests and Kolodziejezyk’s general 
linear hypothesis has not been giveu explicitly.) The sixth chapter develops multiple regression theory 
and the elements of multivariate analysis; Wishart’s, Hotelling’s, Fisher’s discriminant and Mahala- 
nobis’s D? distributions are derived by elementary methods. The seventh chapter on ‘non-parametric 
tests’ is the least satisfactory, being rather scrapry and eclectic (but it only claims to be an intro- 
duction). Ordered variables, the probability integral transform, the runs-, Kolmogorov—Smirnov-, 
Wilcoxon- and sign-tests are briefly dealt with (Continental contributions being given here more 
credit than is perhaps their due) and there is little mention of power. The short eighth chapter treats 
briefly of the classical Bayes theory and its consequences in regard to modern theory. The whole is 
preceded by full German-English glossary and followed by an index of moderate length. [Note: the 
index-reference to x? as on p. 379 must be a misprint for p. 279.] 

Criticisms of the book are chiefly on the score of omissions and relative neglect, partly no doubt due 
to reasons of space and the nature of the treatment. I feel that a fuller treatment of x is deserved; 
contingency tables, index of dispersion, Eisenhart’s theorem on the power and optimum grouping 
do not appear to be mentioned. More complex analysis of variance models and their powers should 
be at least described. Combinatorial methods get scant treatment (Mood’s tests are not mentioned) 
and discrete variables also. I cannot find that the negative binomial, Polya’s or Neyman’s Contagious, 
distributions are mentioned, while rank correlation gets only a footnote reference. The distribution of 
the correlation coefficient, Smirnov’s w? and Neyman’s y? also seem to be absent. Sequential testing 


could well be given a few pages rather than a bare mention. D. E. BARTON 


Statistical Methods and Scientific Inference. By Str Ronatp A. Fisner. Edinburgh: 
Oliver and Boyd. 1956. Pp. viii+175. 16s. 


This book is welcome in providing an up-to-date and authoritative account of Sir Ronald Fisher’s 
views on statistical inference. Readers new to statistics are, however, advised not to try to read 
Fisher’s book entirely on its own. The issues discussed are fairly technical and sometimes controversial ; 
and this book largely concentrates on the author’s original contributions, without always adequate 
representation of other points of view. The book is in consequence likely to repay study most by those 
already well acquainted with current statistical methods. 

The first two chapters are mainly historical, but at the end of the second chapter we first come to 
the essential step of saying what is meant by probability (essential because our attitude on probability 
will colour our whole approach to the inference problem). A more detailed discussion would have been 
useful on what is meant by statistical inference; for the reviewer’s experience is that the word ‘statistics’ 
is also interpreted differently by different people. Thus in the recent American book, The Foundations 
of Statistics, by L. J. Savage, it is implied in the introduction that mathematical statistics is to be 








294 Reviews 


identified with the ‘mathematical treatment of problems of inductive inference’. The historical and 
scientific association of the word ‘statistics’ with those types of observational phenomena, in which 
the properties of the group rather than of the individual are considered, seems there ignored. Yet it 
is this aspect which has given rise to the frequency or statistical interpretation of probability, in 
contrast with the notion of probability as a degree of belief or credibility. Fisher, while favouring the 
statistical interpretation, emphasizes that, before a probability defined by some aggregate or population 
is associated with any particular event, we must be unable to associate any ‘subaggregate’, with a 
different probability, to this same event. This is a relevant inductive consideration, though it seems 
logically distinct from the concept of statistical probability itself, and there may be difficulties in 
assigning to it any very precise meaning. It is well known that there are at least two rival schools 
of thought on statistical inference: either one can play down the statistical aspects, and argue ex- 
plicitly in terms of inverse probability, prior and posterior probabilities, and perhaps utilities, or one 
can attempt, as many statisticians do, to summarize statistical features of the data as an aid to, but 
not necessarily as a substitute for, one’s final inductive inferences or decisions. The impression gathered 
from this book is that Fisher is putting forward methods that logically lie somewhere between these 
two schools, but in the reviewer’s opinion these methods create difficulties and apparent inconsistencies 
that outweigh any possible advantages. Thus in Chapter 3 (bottom of p. 42) Fisher uses such a phrase 
as ‘a test of significance applied to a single hypothesis by a unique body of observations’, but omits to 
explain either on what justification the concept of a random sample is applied to such unique data, 
or on what basis the critical region, in the Neyman—Pearson sense, of the significance test is arrived 
at (a small probability for the observations is not sufficient; for example, any hand at bridge has the 
same probability as a hand with all trumps). Both in this book and elsewhere Fisher has criticized 
the notion of ‘repeated sampling’ without appearing to recognize its close relation to the concept of 
the random sample.* The idea that the random variation should be restricted as far as possible, so 
that only samples as far as possible like the one observed should be considered, is a valuable one which 
Fisher has promoted on several occasions; if the statistical viewpoint is to be retained, however, there 
is a limit to what can be done in this way. For example, it is possible to test the difference in means 
of two samples either on the assumption of normality or on a permutation test based on the actual 
observational values; but insistence on the latter approach is neither customary nor (for very small 
samples) necessarily rewarding. The failure of the Behrens—Fisher test to have the frequency inter- 
pretation required by many statisticians arises in the reviewer’s opinion from an inconsistent attitude 
to the observed variance ratio s7/s}, which is at one stage of the derivation considered random and 
at the next considered fixed.t The suggestion (p. 56) that no new axioms are needed for fiducial 
inference would thus not be generally accepted. 

The exposition of Fisher’s theory of fiducial inference given in this book is as complete as will be 
found anywhere, but the ‘theory’ when more than one unknown parameter is present still seems left 
rather more a matter of definition for each individual problem than a general method. In the case of 
one parameter, the theory was accepted by many statisticians because of its frequency interpretation, 
which renders it practically equivalent to the theory of (efficient) confidence intervals obtained from 


* Cf. E. 8. Pearson’s remarks in ‘Statistical concepts in their relation to reality’, J. R. Statist. Soc. 
B, 17 (1955), 204-7. 

+ It is not the primary intention of the reviewer to answer here specific references in the book to 
his earlier criticisms of the Behrens—Fisher test, but the following remarks are relevant: 

(i) (p. 96) The ‘randomization test’ was put forward purely to demonstrate its incompatibility on 
a frequency interpretation with the Behrens—Fisher solution. The criticisms by Fisher of the use of 
randomization with tests of significance have some weight, but the use of randomization in the theory 
of games and in experimental design are also not above criticism if attention is focused on a single 
decision or a single experiment. The test actually used on a practical problem in the first critical 
paper (Proc. Camb. Phil. Soc. 32 (1936), 560-6) ascribed a significance level lying between the values 
obtained on the assumptions o7/02=0 and 1. A closer assessment of the significance level, taking into 
account the observed variance ratio s7/s3, is now available from Welch’s test (Biometrika, 34 (1947), 
28-35). 

(ii) (pp. 120 and 162) The reviewer's original reference to the particular function ~+o was unfor- 
tunate, for he had overlooked the confidence (as well as fiducial) interval solution for the particular 
linear function ~+a0; this solution follows from the sampling distribution of the ‘non-central ¢’ 
being independent of o. The general principle still, however, stands that a confidence interval solution 
for an arbitrary function of two unknown parameters does not follow automatically from a simultaneous 
sampling distribution involving them. Even in Fisher’s fiducial theory, the problem of uniqueness is 
hardly settled by the discussion of a particular example on pp. 169-72. 














conti 
relati: 
value 
of m¢ 
tunat 
‘uniq 
induc 
the p 
no kn 
infere 
demi 
extre 
Th 
an ex 
as a 
statis 
woul 
as aj 
data 











Reviews 295 


continuous random variables. In the book confidence intervals are almost entirely referred to in 
relation to discrete variables, but as Fisher has rejected the frequency basis for fiducial intervals the 
value of confidence intervals in all cases must be considered. Such confidence intervals are admittedly 
of more relevance in some applications than others; however, such routine statistical methods for- 
tunately, if not used too automatically, often have an inductive value even when the data are more 
‘unique’. In the case of small sample theory, over which there is likely to be most argument, this 
inductive value arises partly because statistical experience (external to the sample) on the nature of 
the population often exists. For fiducial inference Fisher has introduced the requirement (p. 56) that 
no knowledge a priori on the value of the parameter should be available. This is desirable if the fiducial 
inference is to be regarded as properly inductive, but the scope of fiducial inference seems now aca- 
demically narrow, and the logical status of other prior knowledge or assumptions in the theory is left 
extremely obscure. 

The problem of specification arises again in the section beginning on p. 66 (cf. also p. 127), where 
an explicit record of the likelihood function, as it varies with an unknown parametric value, is proposed 
as a summary of the data. The likelihood function is certainly relevant and important for many 
statistical inferences; nevertheless, any table proposed on the basis of a particular probability model 
would rarely be an adequate substitute for the data themselves, as the model might not be accepted 
as appropriate by other workers. Standard statistical methods become less available when the 
data or phenomena under consideration are highly individual, and standard inductions likewise. 
Even when the likelihood function is accepted as appropriate, the circumstance that it does not 
involve the number of ways by which the observed outcome might have occurred implies that it is 
not dependent on the sampling procedure (for example, whether direct or inverse binomial sampling), 
a knowledge of which is sometimes required. 

Apart from the concluding section on simultaneous fiducial distributions, the last chapter covers 
the theory of estimation associated with the name of the author; the importance and established value 
of this theory might well have justified its consideration earlier, except that it is now fairly familiar to 
the readers who, it has been here suggested, will most profit from this book. Such readers will be 
entertained but not otherwise diverted by Fisher’s occasionally provocative style. The problem of 
inductive scientific inference, and in particular statistical inference, is one in which impartiality and 
tolerance are as essential to further progress as rules of procedure. Any suspicion of dogmatism, 
from whatever quarter, or however illustrious the person concerned. would be regrettable. 


M. S. BARTLETT 


Symposium on Monte Carlo Methods. Edited by H. A. Mnyver. New York: John 
Wiley and Sons, Inc.; London: Chapman and Hall, Ltd. 1956. Pp. xvi+382. 60s. 


Interest in Monte Carlo methods is increasing again after suffering a depression around 1952. It is 
reflected in this book which contains papers presented at a symposium held at the University of 
Florida on 16 and 17 March 1954. There are twenty contributions headed by an introduction by 
A. W. Marshall which is intended as a guide to the non-specialist; this traces the development of 
Monte Carlo since its christening in about 1947 and attempts to place the succeeding papers in 
perspective, giving a brief description of most of them. 

Three papers are devoted to the generation and testing of random variables. O. Taussky and J. Todd 
give a brief survey of the available methods of generating pseudo random numbers by arithmetical 
processes and present the results of some tests on these. J. W. Butler’s paper contains many useful 
ideas on the important subject of transforming these random numbers into samples from given 
probability distributions. He discusses the three basic methods which he calls the direct, composition 
and rejection methods. It should perhaps have been pointed out in discussing these that the last, using 
as it does a variable number of random numbers, is often inconvenient when several correlated problems 
have to be studied. 

The remaining papers vary from the straightforward to the highly original. Of the latter that by 
H. F. Trotter and J. W. Tukey is outstanding. Here, amid many comments that show a very fine 
appreciation of the fundamental principles of good Monte Carlo practice, we are introduced to a new 
variance-reducing technique. This is applied to a complicated problem about normal samples in another 
paper where its success is amply demonstrated. It is a pity that the paper is marred by a rather 
obtuse explanation of the very heart of their idea. 

H. Kahn gives a very useful collection of formulae and results connected with many of the now 
commonly used Monte Carlo tricks, for several of which he is largely responsible. This is clearly the 








296 Reviews 


voice of a very widely experienced practitioner of Monte Carlo. The diffusion of y-rays through shields 
is amply treated in two papers by M. J. Berger and by L. A. Beach and R. B. Theus, in which can be 
found examples of many of the techniques desuribed by Kahn. On the purer mathematical side, 
J. H. Curtiss gives a lengthy account of an investigation into the relative efficiencies of a Monte Carlo 
and the normal computational procedures for solving a set of linear simultaneous equations. 

The book ends with a bibliography of some 300 entries divided into four sections, the first two, on 
Monte Carlo proper and random number generation, containing generous abstracts. There are source, 
name and subject indices to the whole book. 

Though most of the subject-matter in this book is not new it is a valuable collection of ideas that 
had previously had only a limited circulation. It is a pity it could not have been produced more closely 
after the date of the symposium. K. W. MORTON 


The Essentials of Educational Statistics. By F.G.CorNnELL. New York: John Wiley 
and Sons Inc.; London: Chapman and Hall Ltd. 1956. Pp. 375. 46s. 


The author writes in his preface: ‘as an introduction to statistics this book is not a handbook, though 
it contains more than enough material for the usual first two semesters. The interest has been to 
include essentials both of specific techniques and of underlying ideas. It seems reasonable that a 
book with this purpose should cover a limited ground thoroughly rather than much ground lightly.’ 
In spite of the author’s words, however, this book undoubtedly falls into the category of handbooks. 

The student envisaged as a reader of this book must be one with no mathematical knowledge, since 
no derivations of any kind are given, the necessary theorems being stated quite clearly but without 
symbolism. For example, on pp. 113-14 when beginning to discuss the sample mean the author 
states as principles: 

(a) The mean of means of all possible random samples of size n drawn from a population equals the 
mean of the population. 

(b) The variance of the sampling distribution of means of size n is equal to 1/n times the variance 
of the population. 

(c) The sampling distribution of means of samples of size n from a normal population is itself 
normal, 

(d) The sampling distribution of means of size n ... for a wide variety of non-normal populations 
approaches a normal distribution as n becomes increasingly large. 

[It might have been well to have inserted ‘with replacement’ or possibly ‘independent’ somewhere 
in these principles but this point is mentioned in the discussion in the text.] This method is quite 
clearly pursued throughout the book and may be found adequate for students who want to know how 
the various standard techniques work without asking why they are used and how their properties are 
derived. It is inevitable, however, that mathematical symbols are used for criteria, test functions 
etc. and this may cause the student who likes words, but no mathematics, some anxiety. 

The topics covered are the sampling distribution of the mean (population o known and not known), 
significance tests for correlation and regression, y? for goodness of fit and in contingency tables, two 
sample tests, simple analysis of variance, curvilinear regression and partial and multiple correlation. 
Abbreviated tables of the normal curve and of ¢ are appended. 

The title itself is a misnomer. The link with education is tenuous and what the author has done is 
to write a very elementary statistics text-book. ¥F. N. DAVID 


Elementary Statistical Methods (revised). By W. A. Netswancer. New York and 
London: The Macmillan Company. 1956. Pp. 749. 48s. 6d. 


This is a substantial text-book suitable for use in a one year’s course in statistics. It is revised sub- 
stantially from the 1943 edition when discussing sampling methods and is extended by chapters on 
sampling errors and statistical inference. In much of the book the illustrative data is more recent 
than jes available at the time of the last edition. 

Thd book opens with an excellent general discussion on the use of statistical methods and the 
interpretation of results, including an extensive section on the way to ask questions and design a 
sample. Then follows a series of chapters on presenting the data obtained by means of charts and 
graphs, frequency distributions and calculated measures of average, dispersion and skewness. There 











are tv 
summ 
and ¢ 
metic 
numb 
biblio 
The 
treate 
detail 
makes 
maste 
opinio 
ponde 
might 
meets 
too lar 
descri 
distrib 
workir 
former 
been s 
the mi 
way, a 
Cale 
and pi 
figures 
but th 
to the 
in the 
the tal 
curious 


whilst 
signifie 
series t 
may w 
student 
a resul 
on the 
to eleve 
The | 
that in 
short ci 
In ge 
of a sul 


Statis 
Tl 


This is 
learning 
bias—o: 
divided 
and Spe 

Statis 
location 








Reviews 297 


are two chapters on sampling errors and statistical inference, two on index numbers, one of which 
summarizes those available in the United States, three on the analysis of time series including seasonal 
and cyclical variation and two on regression analysis and correlation. There are appendices on arith- 
metic and computational methods, and on the symbols commonly used; there are tables of random 
numbers, the normal integral and of values of t but not of y?. Each chapter is followed by a short 
bibliography and a set of questions. 

The exposition throughout is very clear and within the field chosen most aspects of the subject are 
treated well and completely. In an elementary text-book there is always a problem of deciding how 
detailed to make the exposition and how much to leave to the student’s intelligence. Too much detail 
makes the task before the student look worse than it is and he may learn a lot of detail before he 
masters general principles; but if there is too little detail he may fail to follow essential steps. In my 
opinion this book, if anything, errs on the side of giving too much detail. It will perhaps seem rather 
ponderous, especially to those students who are taking several courses in the same session, and it 
might be better to give the student the idea and let him gradually sort out the snags himself as he 
meets them. For instance, does the student really need to be told that if he makes the scale of a chart 
too large some of the observations may occur outside the limits intended? On the other hand, detailed 
descriptions of the errors which may arise in the calculation of an arithmetic mean from a frequency 
distribution instead of from an individual list and of the method of calculating the true mean when 
working from a guessed value of the mean without usig algebra are very good indeed, though in the 
former case the problem of recorded values tending to bunch, say at multiples of ten, might have 
been stressed. The problem arising from assuming all values in a frequency class are concentrated at 
the mid-point of the class in the calculation of the standard deviation is not discussed in a similar 
way, and the correction factor (—Z*) in the short cut method is merely stated, not explained. 

Calculations in the text were obviously undertaken with assistance from a calculating machine 
and products and quotients are commonly shown to seven, sometimes to ten or eleven significant 
figures. Early in the book there is a discussion on spurious accuracy and on errors arising in rounding, 
but there is little indication later of the application of these excellent precepts. Nor is guidance given 
to the student as to what he can do in practice to avoid having to use as many significant figures as 
in the text of the book if he possesses, for example, not a calculating machine but only a slide rule or 
the tables of logarithms and square roots at the end of the book. The method used produces some 
curious results. For instance in the calculation of a correlation coefficient we have 


135,919,105 

”~ 176,439,387 

whilst individual values of a trend line, fitted by the method of least squares are given to seven 

significant figures where in only four out of thirty-five values does the fitted line represent the original 

series to the second figure. Looking at the calculations for this trend line made by two methods one 

may well doubt if it is true that the simplicity of the short cut method strongly commends it. The 

student might well also wonder why the short cut method for calculating a standard deviation gives 

a result 28-906 cents and the method working with deviations from the true mean gives a result, 

on the facing page, of only 28-9 cents, and this after having a sum of deviations squared calculated 
to eleven significant figures. 

The English student of economics using this book for an ancillary course in statistics will conclude 
that in America figures are not a scarce factor of production and hence go on to assume that any real 
short cut methods are added for academic interest. 

In general this is a good book for students who have the time and are prepared to learn the elements 
of a subject slowly but methodically, hence getting a good grounding. 


=i, 


H. S. BOOKER 


Statistics: a New Approach. By W. ALLEN WaLLIs and Harry V. Roserts. Illinois: 
The Free Press. 1956. Pp. xxxviii+637. $6.00. 


This is a cookery-book designed for those wishing to know about statistics who have no mathematical 
learning. The logical processes of the mathematics of statistical theory are explained in words with a 
bias—or so the reviewer thinks—in favour of those interested in economic applications. The book is 
divided into four sections entitled, the Nature of Statistics, Statistical Description, Statistical Inference 
and Special Topics. 

Statistical Description covers univariate and bivariate frequency distributions and measures of 
location, dispersion and association. In Statistical Inference we have probability and randomness, 











298 Reviews 


a summary of tests based on the normal curve, and some notes on estimation. The section of Special 
Topics is a somewhat ‘mixed bag’ containing the design of experiments, quality control, curvilinear 
regression and time series. Formulae are given freely but no derivations or justification other than 
in words are attempted. 

At the level of exposition aimed at the book may prove useful to the non-mathematician. Those 
with any knowledge of mathematics for whom, according to the preface, the book is also intended to 


be useful, are likely to go empty away. F. N. DAVID 


Theoretical Genetics. By R. B. Gotpscumipt. University of California Press (for whom 
Cambridge University Press act as agents). 1956. Pp. 563. 64s. 


Many readers of Biometrika are interested in genetics, and some of them have at one time or another 
contributed to its mathematical theory. All these could profitably read the Theoretical Genetics of 
Richard Goldschmidt, the grandmaster of contemporary genetics and president of the last International 
Congress of the subject. They might find—and perhaps be disturbed by their discovery—that genetics 
is not necessarily a happy hunting ground for statisticians having but little biological knowledge. 
While the book does not cover the more mathematical branches of genetics, such as population genetics 
or the theories of mutation and selection, it provides an exhaustive and authoritative discussion of 
the concept of the gene, as it developed during the author’s lifetime and in fact to a large extent by 
his own stimulating influence. The main topics are (1) the nature of the genetical material (239 pages), 
(2) the mode of action of the genetical material in controlling specific development (234 pages) and 
(3) the consequences of the nature and actions of the genetical material for evolution (18 pages). 

These three topics do not of course cover the entire field of modern genetics and their treatment 
does not aim at completeness even within their restricted framework; but the discussion even if 
subjective and selective is of the greatest interest for anybody who wants some clarity in his ideas of 
the most central and possibly most intricate field of biology. He will find some branches of genetics, 
if not ramshackle, at least far more dubious and incomprehensible than he would assume from a perusal 
of the ordinary text-books. He may also at times be amused by the exposure of premature and rather 
childish generalizations by meritorious and even famous research workers in the field of biochemical 
genetics, or by the attempts of embryologists to keep genetics out of their field. 

Goldschmidt is quite explicit in describing ‘gene action’ as we infer it as differential action, i.e. as 
resulting in abnormal or at least divergent development, but he does not draw the consequences of 
this fact. He does not consider genetics as a science of differences but rather as a normative science 
comparable to embryology or biochemistry. This seems to the reviewer the main weakness of this 


excellent book. Fifty pages of Bibliography and two indices greatly add to its value. H. KALMUS 


Rank Correlation Methods (second edition). By M. G. Kmnpatu. London: Charles 
Griffin and Co. Ltd. 1955. Pp. 196. 36s. 


This is a revised and much enlarged second edition of a book which first summarized the theory of rank 
correlation. In its new form the book brings up to date the theory of ranking and may be considered 
indispensable for the research worker in both the theoretical and practical fields. 


Probability (second English edition). By A. N. Kotmocorov. New York: Chelsea 
Publishing Company. 1956. Pp. 84. $2.50. 


This is a second edition of the translation of the original German monograph Grundbegriffe der Wahr- 
scheinlichkeitsrechnung by the world’s leading probabilist. It is a book from which nearly all our 
modern ideas may be deemed to have sprung. In this, the second edition, there are no alterations. 
A supplementary bibliography by A. T. Bharucha-Reid, which neither adds to nor detracts from the 
value of the book, has been added in an attempt to ‘reflect the present status and direction of research 
activity in the theory of probability’. This is possibly a mistake, since the bibliography is ephemeral 
while the book is part of probability. 








Pul 


le a, ee 


_ —— ae 
a a 


ao ee 


Gq 
na 


Be 


ea 








Reviews 299 


_ 


Irrationalzahlen (second edition). By O. Perron. New York: Chelsea Publishing Co. 
1951. Pp. 199. Paper $1.50; cloth $3.25. 


This is a reprint of the second edition of Perron’s well-known work. (The third edition, of 1947, differed 
from the second only in one page and a footnote.) The paper and printing are good, and the reviewer 
could find no serious misprint. 

Starting from a set of twenty-one axioms satisfied by the rational numbers, the author defines 
irrational numbers as Dedekind sections of the rationals, and shows that they satisfy the same axioms. 
He goes on to deal with limits, powers and logarithms. Then without further use of analysis he dis- 
cusses various methods (particularly continued fractions) of approximating to irrational by rational 
numbers. The degree of accuracy attainable is investigated. He concludes with a chapter on algebraic 
and transcendental numbers, proving that the numbers e and 7 are transcendental. This means that 
neither of them is a root of an algebraic equation, of any degree, with integral coefficients. 

The work may be regarded largely as an introduction to the author’s Die Lehre von den Ketten- 


briichen. There is a good bibliography. G. I. WATSON 


Trigonometrical Series (second edition). By A. Zyamunp. New York: Chelsea 
Publishing Co. 1952. Pp. 329. Paper $1.50; cloth $4.95. 


It is now twenty-one years since Zygmund’s Trigonometrical Series appeared in the Monografje 
Matematyczne. The present edition is of the corrected reprint of 1952. To have produced this beautifully 
printed volume on excellent paper at $1.50 is a triumph of publishing economics. 

There is still no book on the subject which compares with Zygmund’s for clarity or thoroughness. 
It is a mathematician’s book, but apart from a knowledge of Lebesgue integration it presupposes no 
specialized knowledge of mathematics. Although the character of the book is to approach questions 
of convergence and summability concretely, many readers will find in it an excellent introduction to 
more generalized theories such as that of Linear Operations. There is a long chapter on Riemann’s 
theory of trigonometric series and a short one on Fourier’s integral. Examples and miscellaneous 
theorems supplement the text, and there is an extensive bibliography. 

No student of advanced mathematics and no mathematical library should be without this classic 


of analysis; the publishers are to be congratulated on their enterprise in bringing the volume within 
the reach of all. H. KESTELMAN 


Publications of the Mathematical Institute of the Hungarian Academy of Sciences. 
Vol. 1, Fase. 1 and 2. 1956. [Subscriptions through the trade organization ‘Kultira’, 
Sztalin at 21, Budapest VI, Hungary.] 


The Mathematical Institute of the Hungarian Academy of Sciences announces that this is their new 
name and that the above publication continues the series ‘Publications de l'Institut des Mathématiques 
Appliquée de l’Academie des Sciences de Hongrie’, of which three volumes have been published 
[vol. 1 (1952), vol. 2 (1953), vol. 3 (1954)]. The change of the title of their publications is a consequence 
of the corresponding change of the name of the Institute on 1 August 1955 from the Institute for 
Applied Mathematics of the Hungarian Academy of Sciences. 


Methods in Numerical Analysis. By K. L. Nretsen. New York and London: The 
Macmillan Company. 1956. Pp. 382. 48s. 6d. 


‘Practically every university is now teaching a course in numerical analysis’, says Dr Nielsen in his 
preface, ‘and there is a need for an elementary textbook.’ This purpose his book fulfils. It has nine 
chapters, titled I, Fundamentals; II, Finite Differences; III, Interpolation; IV, Differentiation and 
Integration; V, Lagrangian Formulas; VI, Ordinary Equations and Systems; VII, Differential and 
Difference Equations; VIII, Least Squares and their Application; IX, Periodic and Exponential 








300 Reviews 


Functions. There are also answers to exercises, a bibliography, and nineteen tables, including eleven 
tables of coefficients for interpolation, differentiation and integration. 

Securing his rear at the start with a list of explanations of symbols such as =, > and, , Nielsen 
covers a good deal of ground, many topics being treated at length with mathematical proof and others 
(e.g. relaxation—three pages, one worked example) briefly but with references. Matrix inversion and 
the solution of sets of simultaneous linear equations are done exclusively by Crout’s method ; mention 
might well have been made of the requirement for adjusting the elements of the matrix by powers of 
10 where their sizes are disparate. The treatment of least squares and data fitting (Chapters VIIT and 
IX) will not serve adequately as a text for statistical computation, since it is directed solely to the 
evaluation of the regression constants. 

To the student, a particularly valuable feature of the book is the array of clearly explained and well 


laid out illustrative examples. It is also a useful reference book. T. LEWIS 


PUBLICATIONS OF THE U.S. DEPARTMENT OF COMMERCE, 
NATIONAL BUREAU OF STANDARDS 


(i) Tables of functions and zeros of functions. Applied Mathematics Series no. 37. 
1954. Pp. ix+211. $2.25. 


This is a book containing sixteen divers tables of special functions and zeros of special functions, 
chiefly related to Bessel Functions or which have been derived as ancillary to their computation. They 
have almost entirely appeared previously in the J. Math. & Phys. and the Bull. Amer. Math. Soc. 
[1942-9]. Those in the former journal, being designed as aids to physicists, have little statistical 
application, but we may mention Table 6, of the Struve functions, which should prove useful in tabu- 
lating the p.d.f.’s of simple forms of non-central y?. Tables 7 (of Fourier coefficients), 8 (of sin x and 
cos « for «= 100(1) 1000), 9 (for finding logarithms to 25 places) and 10 (of 2"/n!) will be occasionally 
useful. The tables of aids to special quadrature methods should be of frequent assistance to the re- 
search statistician however. These are Tables 11 (of ‘the zeros of the Legendre polynomials of orders 
1-16 and weight coefficients for Gauss’ mechanical quadrature formula’) and 12 (of ‘zeros and weight 
factors of the first fifteen Laguerre polynomials’). Table 11 is, in fact, the now standard table 
of Lewan, Davids and Levenson. Each table is preceded by an introductory essay, with full references, 
especially noting articles reviewing the state of tabulation of the function and its relations. 


(ii) Contributions to the solution of systems of linear equations and the deter- 
mination of eigenvalues. Edited by Otaa Taussxky. Applied Mathematics Series 
no. 39. 1954. Pp. 139. $2.00. 


The symposium has rather more coherence than is usual in this type of publication; many of the 
contributions are complementary and all are, or contain, reviews of the literature. The book lacks an 
index and the list of contents is so meagre as to be misleading. L. Fox gives an exhaustive description 
of the ‘elimination’ and related methods of solution of linear equations, with a wealth of worked 
examples, and A. I. and G. E. Forsythe discuss punched card methods for the ‘accelerated gradient’ 
method. R. M. Hayes treats the convergence of a class of iterative processes for the solution of, inter 
alia, Fredholm integral equations and multiple integrals. Other contributions discuss the ‘condition’ 
of the nxn matrix S,=| (¢+j—1)-|| ;, 4...» and table S;" for n=1, 2, ..., 10. Lower bounds for 
the rank of square matrices, location of latent roots (‘eigenvalues’) and bounds for them are also 
treated. 


(iii) Table of the Descending Exponential. Applied Mathematics Series no. 46. 1955. 
Pp. 76. 56 cents. 


The negative exponential function, e-*, is tabulated to 20 decimal places at intervals of 0-001 for 
x=2°5 tor=10. 





—_ 





Tal 


Thre 


gro 





Reviews 301 


Tables of the Cumulative Binomial Probabilities. (Ordnance Corps Pamphlet 
PB 111389). Washington: U.S. Department of Commerce, Office or Technical Services. 
1952. Pp. viiit+577. $6.00. 


Through the well-known relation 
n 
> °C,p"(1—p)"*=1,(c, n—c+]), 
r=C 
where J is the incomplete B-function ratio, these tables may be used to give either the cumulative 
binomial probabilities or incomplete B-function values. The function is tabled to 7 decimal places for 
p=0-00(0-01) 0-50 and n=1(1)150. 


No information is given as to the method of computation. Previous tables of this kind include: 

(a) the National Bureau of Standards Tables (Applied Mathematics Series 6, 1950) which covered 
the range n= 2(1)49, and were derivable from Karl Pearson’s Tables of the Incomplete Beta-Function 
(Cambridge University Press, 1933). 

(6b) H. G. Romig’s 50-100 Binomial Tables (Wiley, 1953) for which n= 50(5) 100. 

The present tables, therefore, cover important fresh ground for n> 100. 


CORRIGENDA 
Biometrika (1955), 42, pp. 531-3 
‘The likelihood ratio test for Markoff chains.’ By I. J. Goop 


Mr Leo A. Goodman has been kind enough to allow me to see some work that he has not 
yet published in which, among other things, he points out that my paper in Biometrika, 
42 (1955), 531-3 contains a number of inaccuracies concerning the non-cyclic (non- 
circularized) case. In addition he mentions the following errata. 


Equation (7) should read K, = 2N log N, K_, = 2N log (Nt). 


In the last line § 5, Vy? should be replaced by Vy2,,. 


For further details the reader is referred to Mr Goodman’s forthcoming paper ‘Simplified 
group tests for Markoff chains’. 
I. J.G. 








302 Reviews 


OTHER BOOKS RECEIVED 


The Elements of Probability Theory and some of its Applications. By H. Crammr. 
New York: John Wiley and Sons, Inc.; London: Chapman and Hall. 1955. Pp. 281. 
56s. 

Infinite Sequences and Series. By K. Knopp, translated by F. Bacemrmi. New York: 
Dover Publications Inc. 1956. Pp. 186. $3.50 cloth; $1.75 paper. 


Numerical Analysis, vol. VI [Proc. 6th Symposium in Applied Mathematics of the 
American Mathematical Society held 1953]. Edited by Jonn H. Curtiss. New York, 
Toronto and London: McGraw Hill Book Company Inc. 1956. Pp. 303. 73s. 


Statistical Mechanics. By T. L. Hitt. New York: McGraw Hill Book Company Inc. 
1956. Pp. 392+ 41 pp. appendices and index. 67s. 6d. 


Nonparametric Statistics. By 8. Strcet. New York: McGraw Hill Book Company Ine. 
1956. Pp. 239+ 74 pp. references and appendix. 49s. 


Immigrants and their Children, 1850-1950. By E. P. Hutcutnson. New York: John 
Wiley and Sons Inc.; London: Chapman and Hall. 1956. Pp. xiv+391. 52s. 


Wahrscheinlichkeitstheorie. By H. Ricurer. Berlin: Springer-Verlag. 1956. Pp. xi+ 
435. DM.66. 


Indice Cronologico de Legislacion Estadistica 1813-1956. 

Resumen historico de la Estadistica en Espana. 

Publicaciones estadisticas de Espana. 
Published on the occasion of the Centenary of the Estadistica Oficial Espanola, 1856- 
1956, by Instituto Nacional de Estadistica, Madrid. 


Vocabulario Brasileiro de Estatistica. By Mimron pa Sitva Ropriauss. Brasil: 
Universidade de Sao Paulo. 1956. Pp. 304. 


Petrographic Model Analysis. By F. Cuayes. New York: John Wiley and Sons Inc.; 
London: Chapman and Hall. 1956. Pp. xii+113. 44s. 


Serie Cicliche ed Oscillanti. By Francesco Brampitia. Milan: Centro per la 
Ricerca Operativa, Universita Commerciale ‘Luigi Bocconi’. 1955. Pp. 320. 


Statistica. By Francesco BrampiLia. Italy: La Gollidrdica. Vol. I, La Variabilita 
Strutturale. 1955. Pp. 672. Vol. II, La Teoria Della Stima. 1956. Pp. 688. Price 
together L. 9000. 


Scientific Inference (second edition). By Sir Harotp Jerrreys. Cambridge University 
Press. 1957. Pp. 236. 25s. 

Culture and the Structural Evolution of the Neural System. [James Arthur Lecture 
on the Evolution of the Human Brain 1955.] New York: The American Museum of 
Natural History. 1956. Pp. 57. 


The Lognormal Distribution, with special reference to its uses in economics. [University 
of Cambridge Department of Applied Economics Monograph 5.] By J. ArrcHison 
and J. A. C. Brown. Cambridge University Press. 1957. Pp. 176. 35s. 





1 


A standa 
(commen 
Arithmeti 


T 
ce 
G 








TRACTS FOR COMPUTERS 


Department of Statistics, University College, London 


I. Tables of the Digamma and Trigamma Functions. By ELEANOR PAIRMAN, M.A, 


; cd 1 
Tables for summing S = 5 - 7 
. e Pa (Pri+ 91) (Poi te) --» (Pnit+|n) 





where the p’s and q’s are numerical 
factors. Price 5s. net. ‘ 


V. Table of Coefficients of Everett’s Central-Difference Interpolation Formula. By A. J. 
THOMPSON, PH.D. Second edition. Price 7s. 6d. net. 


VII. Table of the Logarithms of the Complete [-Function (to ten decimal places) for 
Argument 2 to 1200 beyond Legendre’s Range (Argument 1 to 2). By EGon S. PEARSON, 
D.Sc. Price 5s. net. 


IX. Log [ (x) from x=1 to 50-9 by intervals of 0-01. By JoHN BROWNLEE, M.D., D.Sc. 
Price 5s. net. 


X. On Quadrature and Cubature or on Methods of Determining Approximately Single 
and Double Integrals. By J. O. Irwin, D.Sc. Price 7s. 6d. net. 


XII. Tables of the Probable Error of the Coefficient of Correlation. By Kart HOLZINGER, 
PH.D. Price 5s. net. 


XI. Bibliotheca Tabularum Mathematicarum, being a Descriptive Catalogue of Mathematical 
Tables. Part I. A, Logarithms of Numbers. By JAMes HENDERSON, PH.D. Price 9s. net. 


XV. Random Sampling Numbers. By L. H. C. Tippett, M.Sc., with a Foreword by KARL 
PEARSON. Price 5s. net. 


XXIII. Tables of tan-'x and log(1+x?). To assist in the calculation of the ordinates of a Pearson 
Type IV curve. By L. J. Comrig, PH.D. Price 5s. net. 


XXIV. Random Sampling Numbers (2nd Series). By M. G. KENDALL and B. BABINGTON SMITH. 
Price 5s. net. 


XXV. Random Normal Deviates. By HERMAN WoLD. Price 5s. net. 


XXVI. Correlated Random Normal Deviates. By E. C. Fiecter, T. Lewis and E. S. PEARSON. 
Price 10s. 6d. net. 


Nos. II, Il, IV, VI and VII are out of print 





> 


LOGARITHMETICA BRITANNICA 


A standard Table of Logarithms to Twenty Decimal Places. By A. J. THOMPSON, Ph.D. 
(commenced in 1922 to commemorate the tercentenary of the publication of HENRY BRIGGS’s 
Arithmetica Logarithmica). 

The nine separate sections of this Table have now been issued, and the complete work 

consisting of the logarithms of numbers 10,000-100,000, together with Dr Thompson’s 

General Introduction (98 pp.) is now available in two bound volumes. 


Price £8. 8s. od. 





—<— 


issued by the CAMBRIDGE UNIVERSITY PRESS, Bentley House, LONDON, N.W.1 
on behalf of the 
DEPARTMENT OF STATISTICS, UNIVERSITY COLLEGE, LONDON 
and obtainable from any bookseller 








All rights reserved 


BIOMETRIKA Vol. 44, Parts 1 and 2 
CONTENTS 


Joun WisHartT, 1898-1956 : 

P. Armrrace. Restricted sequential senile : 

M. 8. Barttetr. On theoretical models for competitive i saliakoas biological “ean > 

V. T. Pati. The consistency and adequacy of the Poisson—Markoff model for density fluctuations 

E. J. Hannan. Testing for serial correlation in least squares regression . ? s 

8. Korisack and H. M. Rosrensiatt. On the analysis of multiple regression in k cabins 

R. L. Brown. Bivariate structural relation . 

Joun W. Wixxrnson. An analysis of paired comparison ‘one with incomplote itietiblinen 

C. L. Mattows. Non-null ranking models. I . ‘ R ° . 

J. Arrcuison and §S. D. Strvey. The generalization of probit pera to the case of anil ¢ responses . 

8. C. Pearoz. Experimenting with organisms as blocks . ; : . . * ; 

D. R. Cox. The use of a concomitant variable in selecting an experimental ees. 

M. G. Butmer. Approximete confidence limits for components of variance 

D. E. Barton and F. N. Davip. Multiple runs 

D. V. Lixpizy. Binomial sampling schemes and the nates of icdiidinanioss ; 

D. V. Liyptey. A statistical paradox : ° ° . ¢ 
H. W. Hasxey. Stochastic cross-infection iia two tevin isolated groups . . ° 
J. K. Mackenzie and M. J. Toomson. Some statistics associated with the random dincelinitiileen of pete: ° 


J.H. Darwin. The difference between consecutive members of a series of random variables ee in order 
of size : 3 ‘ 


B. I. Hartry. Relation esti the distributions of non- ee t ie of a ‘ciiiadhaiiailh euidiiein coaiiielash 


A. Ciirrorp CoHEN, JR. On the solution of — — for truncated and censored — from 
normal populations . - ‘ ° : 


F. G. Foster and D. H. Rzzs. tapi! piscina saliie of the ssilniitaadl Bota distribution. I 
MISCELLANEA 
E. 8. Pace. On problems in which a change in a parameter occurs at an unknown point . 
D. J. BartHotomew. Testing for departure from the exponential distribution . 
B. I. Hartzy and E. 8. Pearson. The distribution of range in normal samples with n= 200 
M. G. Kenpatu. Studies in the history of probability and statistics. V. A note on playing cards 
Axan Stuart. A singularity in the estimation of binomial variance 
AUREL WINTNER. Student’s distribution and Riemann’s elliptic geometry . : 
Joun GuRLAND. Some interrelations among compound and generalized distributions . 
M. 8. Barrier. A note on tests of significance for linear functional relationships 
M. G. Kenpatu. The moments of the Leipnik distribution . . . 
M. H. Quenovurt1Ez. Theo effect of transformations of variables upon their omnia sdlliieate 
B. I. Haritzy. Further properties of an angular transformation of the correlation coefficient 


FRANKLIN A. GRAYBILL and Joun Leroy Fotks. ery of error variances in a randomized block 
design . ; 


. P. Saurrro. An Parr Fo sanpttey of a aes of Sathionel ienaiinas block ‘ania 
. L, Jounson. Sequentially determined confidence intervals : : 
. Davy, Estimation of means of normal populations from observed minima 
. Darwin. The power of the Poisson index of dispersion . . 
0. LANCASTER. Some oie of the bivariate normal distribution inatahaiell t in the Seal of @ con- 
tingency table ‘ . . ° ° : : : : . F : 
REvIEws 
L. ScHMETTERER’s ‘Einfiihrung in die mathematischo Statistik’ . 
Smr Ronatp A. FisHEr’s ‘Statistical Methods and Scientific Inference’ 
H. A. Meyer (ed.), ‘Symposium on Monte Carlo Methods’ . - 
F. G. CornEtu’s ‘The Essentials of Educational Statistics’ . 
W. A. NEISWANGER’s ‘Elementary Statistical Methods’ 
W. ALLEN Wa tits and Harry V. Roserts’s ‘Statistics: a New heeewe* 
R. B. Gotpscumipt’s ‘Theoretical Genetics’ 
M. G. KenpaAtt’s ‘Rank Correlation Methods’ 
A. N. Kotmoaorov’s ‘Probability’ 
O. Prrron’s ‘Irrationalzahlen’ . 
A. Zyamunp’s ‘Trigonometrical Series’ - 
Publications of the Mathematical Institute of the Meigen poneeenas of idle: a 
K. L. Nretsen’s ‘Methods in Numerical Analysis’ 


Nattonat Bureav or StanpDarps, Publications of the U.S. Department of Commerce, Applied Mathematics 
Series 37, 39, 46 and Ordnance Corps Pamphlet PB 111389 . . 


CORRIGENDA . F . e ; ° e ‘ e 
Oruer Booxs REcEIvED ; ° . ° - e ‘ ° 


Printed in Great Britain at the Dniversity Press, Cambridge (Brooke Crutchley, University Printer) 


RR GE STS sa le TN ee lated 


tact Rien, Osorio ite 


aE eee se IE Wie aks ect) eS! was 


iseere 





he 


eatin Ts 








